pub fn chunked_mixed_dot_product<const CHUNK: usize, A: Algebra<F> + Dup, F: Dup, const N: usize>(
values: &[A; N],
coeffs: &[F; N],
) -> AExpand description
Compute Σ values[i] * coeffs[i] over N pairs.
A single long sum forces every add to wait for the previous one. Instead,
we split the pairs into groups of CHUNK, sum each group on its own, and
add up the group totals. Several partial sums run in parallel on the CPU,
so the total latency is shorter than one straight chain.
The result is the same for every valid CHUNK — only the speed changes.
§Layout
For N = q * CHUNK + r with 0 <= r < CHUNK:
┌── group 0 ──┬── group 1 ──┬─ ... ─┬── tail (r) ──┐
│ CHUNK │ CHUNK │ │ r pairs │
└──────┬──────┴──────┬──────┴───────┴──────┬───────┘
▼ ▼ ▼
tree-sum tree-sum scalar adds
└──► acc ◄────┴──────► acc ◄────────┘§Panics
Compile-time panic if CHUNK is zero.