Module hazmat

Expand description

Low-level tree manipulations and other sharp tools

The target audience for this module is projects like Bao, which work directly with the interior hashes (“chaining values”) of BLAKE3 chunks and subtrees. For example, you could use these functions to implement a BitTorrent-like protocol using the BLAKE3 tree structure, or to hash an input that’s distributed across different machines. These use cases are advanced, and most applications don’t need this module. Also:

Warning: This module is hazardous material. If you’ve heard folks say don’t roll your own crypto, this is the sort of thing they’re talking about. These functions have complicated requirements, and any mistakes will give you garbage output and/or break the security properties that BLAKE3 is supposed to have. Read section 2.1 of the BLAKE3 paper to understand the tree structure you need to maintain. Test your code against blake3::hash and make sure you can get the same outputs for lots of different inputs.

On the other hand:

Encouragement: Playing with these functions is a great way to learn how BLAKE3 works on the inside. Have fun!

The main entrypoint for this module is the HasherExt trait, particularly the set_input_offset and finalize_non_root methods. These let you compute the chaining values of individual chunks or subtrees. You then combine these chaining values into larger subtrees using merge_subtrees_non_root and finally (once at the very top) merge_subtrees_root or merge_subtrees_root_xof.

§Examples

Here’s an example of computing all the interior hashes in a 3-chunk tree:

           root
         /      \
     parent      \
   /       \      \
chunk0  chunk1  chunk2

use blake3::{Hasher, CHUNK_LEN};
use blake3::hazmat::{merge_subtrees_non_root, merge_subtrees_root, Mode};
use blake3::hazmat::HasherExt; // an extension trait for Hasher

let chunk0 = [b'a'; CHUNK_LEN];
let chunk1 = [b'b'; CHUNK_LEN];
let chunk2 = [b'c'; 42]; // The final chunk can be short.

// Compute the non-root hashes ("chaining values") of all three chunks. Chunks or subtrees
// that don't begin at the start of the input use `set_input_offset` to say where they begin.
let chunk0_cv = Hasher::new()
    // .set_input_offset(0) is the default.
    .update(&chunk0)
    .finalize_non_root();
let chunk1_cv = Hasher::new()
    .set_input_offset(CHUNK_LEN as u64)
    .update(&chunk1)
    .finalize_non_root();
let chunk2_cv = Hasher::new()
    .set_input_offset(2 * CHUNK_LEN as u64)
    .update(&chunk2)
    .finalize_non_root();

// Join the first two chunks with a non-root parent node and compute its chaining value.
let parent_cv = merge_subtrees_non_root(&chunk0_cv, &chunk1_cv, Mode::Hash);

// Join that parent node and the third chunk with a root parent node and compute the hash.
let root_hash = merge_subtrees_root(&parent_cv, &chunk2_cv, Mode::Hash);

// Double check that we got the right answer.
let mut combined_input = Vec::new();
combined_input.extend_from_slice(&chunk0);
combined_input.extend_from_slice(&chunk1);
combined_input.extend_from_slice(&chunk2);
assert_eq!(root_hash, blake3::hash(&combined_input));

Hashing many chunks together is important for performance, because it allows the implementation to use SIMD parallelism internally. (AVX-512 for example needs 16 chunks to really get going.) We can reproduce parent_cv by hashing chunk0 and chunk1 at the same time:

let left_subtree_cv = Hasher::new()
    // .set_input_offset(0) is the default.
    .update(&combined_input[..2 * CHUNK_LEN])
    .finalize_non_root();
assert_eq!(left_subtree_cv, parent_cv);

// Using multiple updates gives the same answer, though it's not as efficient.
let mut subtree_hasher = Hasher::new();
// Again, .set_input_offset(0) is the default.
subtree_hasher.update(&chunk0);
subtree_hasher.update(&chunk1);
assert_eq!(left_subtree_cv, subtree_hasher.finalize_non_root());

However, hashing multiple chunks together must respect the overall tree structure. Hashing chunk0 and chunk1 together is valid, but hashing chunk1 and chunk2 together is incorrect and gives a garbage result that will never match a standard BLAKE3 hash. The implementation includes a few best-effort asserts to catch some of these mistakes, but these checks aren’t guaranteed. For example, this second call to update currently panics:

let oops = Hasher::new()
    .set_input_offset(CHUNK_LEN as u64)
    .update(&chunk1)
    // PANIC: "the subtree starting at 1024 contains at most 1024 bytes"
    .update(&chunk2)
    .finalize_non_root();

For more on valid tree structures, see the docs for and left_subtree_len and max_subtree_len, and see section 2.1 of the BLAKE3 paper. Note that the merging functions (merge_subtrees_root and friends) don’t know the shape of the left and right subtrees you’re giving them, and they can’t help you catch mistakes. The best way to catch mistakes with these is to compare your root output to the blake3::hash of the same input.

Enums§

Mode: The mode argument to merge_subtrees_root and friends

Traits§

HasherExt: Extension methods for Hasher. This is the main entrypoint to the hazmat module.

Functions§

hash_derive_key_context: Hash a derive_key context string and return a ContextKey.
left_subtree_len: Given the length in bytes of either a complete input or a subtree input, return the number of bytes that belong to its left child subtree. The rest belong to its right child subtree.
max_subtree_len: The maximum length of a subtree in bytes, given its starting offset in bytes
merge_subtrees_non_root: Compute a non-root parent node chaining value from two child chaining values.
merge_subtrees_root: Compute a root hash from two child chaining values.
merge_subtrees_root_xof: Build a root OutputReader from two child chaining values.

Type Aliases§

ChainingValue: “Chaining value” is the academic term for a non-root or non-final hash.
ContextKey: An alias to distinguish hash_derive_key_context outputs from other keys.