Skip to main content

chunked_mixed_dot_product

Function chunked_mixed_dot_product 

Source
pub fn chunked_mixed_dot_product<const CHUNK: usize, A: Algebra<F> + Dup, F: Dup, const N: usize>(
    values: &[A; N],
    coeffs: &[F; N],
) -> A
Expand description

Compute Σ values[i] * coeffs[i] over N pairs.

A single long sum forces every add to wait for the previous one. Instead, we split the pairs into groups of CHUNK, sum each group on its own, and add up the group totals. Several partial sums run in parallel on the CPU, so the total latency is shorter than one straight chain.

The result is the same for every valid CHUNK — only the speed changes.

§Layout

For N = q * CHUNK + r with 0 <= r < CHUNK:

    ┌── group 0 ──┬── group 1 ──┬─ ... ─┬── tail (r) ──┐
    │   CHUNK     │   CHUNK     │       │   r pairs    │
    └──────┬──────┴──────┬──────┴───────┴──────┬───────┘
           ▼             ▼                     ▼
      tree-sum      tree-sum             scalar adds
           └──► acc ◄────┴──────► acc ◄────────┘

§Panics

Compile-time panic if CHUNK is zero.