Function max_subtree_len

Source
pub fn max_subtree_len(input_offset: u64) -> Option<u64>
Expand description

The maximum length of a subtree in bytes, given its starting offset in bytes

If you try to hash more than this many bytes as one subtree, you’ll end up merging parent nodes that shouldn’t be merged, and your output will be garbage. Hasher::update will currently panic in this case, but this is not guaranteed.

For input offset zero (the default), there is no maximum length, and this function returns None. For all other offsets it returns Some. Note that valid offsets must be a multiple of CHUNK_LEN (1024); it’s not possible to start hashing a chunk in the middle.

In the example tree below, chunks are numbered by their 0-based index. The subtree that starts with chunk 3, i.e. input_offset = 3 * CHUNK_LEN, includes only that one chunk, so its max length is Some(CHUNK_LEN). The subtree that starts with chunk 6 includes chunk 7 but not chunk 8, so its max length is Some(2 * CHUNK_LEN). The subtree that starts with chunk 12 includes chunks 13, 14, and 15, but if the tree were bigger it would not include chunk 16, so its max length is Some(4 * CHUNK_LEN). One way to think about the rule here is that, if you go beyond the max subtree length from a given starting offset, you start dealing with subtrees that include chunks to the left of where you started.

                          root
                /                       \
             .                             .
       /           \                 /           \
      .             .               .             .
   /    \         /    \         /    \         /    \
  .      .       .      .       .      .       .      .
 / \    / \     / \    / \     / \    / \     / \    / \
0  1   2  3    4  5   6  7    8  9   10 11   12 13  14 15

The general rule turns out to be that for a subtree starting at a 0-based chunk index N greater than zero, the maximum number of chunks in that subtree is the largest power-of-two that divides N, which is given by 1 << N.trailing_zeros().

This function can be useful for writing tests or debug assertions, but it’s actually rare to use this for real control flow. Callers who split their input recursively using left_subtree_len will automatically satisfy the max_subtree_len bound and don’t necessarily need to check. It’s also common to choose some fixed power-of-two subtree size, say 64 chunks, and divide your input up into slices of that fixed length (with the final slice possibly short). This approach also automatically satisfies the max_subtree_len bound and doesn’t need to check. Proving that this is true can be an interesting exercise. Note that chunks 0, 4, 8, and 12 all begin subtrees of at least 4 chunks in the example tree above.

§Panics

This function currently panics if input_offset is not a multiple of CHUNK_LEN. This is not guaranteed.