Metrics

How a run's events become telemetry: per-iteration sinks, the accumulator, cross-chunk merge, and the protobuf encoding

June 9, 2026

4 min read

A single run produces a DPS number, but a useful sim produces a distribution: a mean and its error, per-spell breakdowns, buff uptimes, resource accounting, and a representative timeline to look at. The engine collects all of that incrementally as it runs, never holding more than one iteration's worth of fine-grained events in memory at a time.

The shape is two layers. Each iteration fills a TelemetrySink with raw events; after the iteration, those events are folded into a TelemetryAccumulator that carries running aggregates across iterations. The sink is cleared and reused; the accumulator grows. At the end, the accumulator encodes itself into protobuf bytes: the ChunkTelemetry that travels back to the rest of the platform.

The two layers

TelemetrySink is the per-iteration collector. It is a bundle of pre-allocated vectors, one per event category, that the combat functions emit into during a run:

crates/engine-ports/src/telemetry_sink.rsL108-L1145 fields

rust

pub struct TelemetrySink {
    pub damage: Vec<DamageEvent>,
    pub auras: Vec<AuraEvent>,
    pub resources: Vec<ResourceEvent>,
    pub cooldowns: Vec<CooldownEvent>,
    pub casts: Vec<CastEvent>,
}

It is allocated once per chunk and cleared at the start of each iteration, so the hot path appends without allocating.

TelemetryAccumulator is the cross-iteration aggregate. It holds the DPS running sums (count, sum, sum-of-squares, min, max, and the full value vector for the histogram), per-spell aggregates, aura uptimes, resource totals, per-second damage buckets, the direct/periodic/pet damage split, and the representative iteration's captured timeline:

crates/engine-sim/src/telemetry.rsL181-L20220 fields

rust

pub struct TelemetryAccumulator {
    iteration_count: u32,
    dps_sum: f64,
    dps_sum_sq: f64,
    dps_min: f64,
    dps_max: f64,
    spell_totals: IntMap<u32, SpellAggregate>,
    dps_values: Vec<f64>,
    representative_iteration: Option<(usize, f64)>,
    aura_uptimes: IntMap<u32, f64>,
    resource_totals: IntMap<u8, ResourceAggregate>,
    cooldown_totals: IntMap<u32, CooldownAggregate>,
    total_duration_ms: f64,
    gcd_locked_ms_total: f64,
    representative_sink: Option<RepresentativeTimeline>,
    pending_capture: bool,
    direct_damage_total: f64,
    periodic_damage_total: f64,
    pet_damage_total: f64,
    bucket_sums: Vec<f64>,
    trace_extras_enabled: bool,
}

The handoff happens in the loop's post-iteration tail. After a run finishes, the loop computes the iteration's DPS and calls three accumulator methods in order: record_sink_events to fold this iteration's sink into the running totals, record_iteration to update the DPS statistics, and maybe_capture_representative to possibly snapshot this iteration's timeline.

Figure 19

Telemetry and Metrics Pipeline

Expands the telemetry box of the simulation pipeline: per-iteration TelemetrySink events fold into the TelemetryAccumulator (per-second buckets, representative iteration, running stats), merge across chunks, and encode to a protobuf ChunkTelemetry.

This figure expands the telemetry box of the sim-pipeline figure. The flow:

During an iteration the combat functions push into the sink (emit_damage, emit_aura, emit_resource, and so on).
record_sink_events folds the iteration's events into the accumulator's running totals, and buckets damage into per-second slots. The timeline's resolution is one second (TIMELINE_BUCKET_MS).
record_iteration updates the DPS sums and re-selects the representative.
merge combines two accumulators when chunks run in parallel.
encode produces the protobuf, building the HDR histogram and packing every aggregate.

The representative iteration

A run does thousands of iterations, but you can only show one timeline. Which one? The accumulator picks the iteration whose DPS is closest to the running mean. After each iteration it compares that iteration's distance from the current mean against the stored representative's distance, and keeps whichever is closer:

crates/engine-sim/src/telemetry.rsL402-L411

rust

let running_mean = self.dps_sum / self.iteration_count as f64;
let distance = (dps - running_mean).abs();
let is_new_representative = match &self.representative_iteration {
    Some((_, rep_dps)) => distance < (*rep_dps - running_mean).abs(),
    None => true,
};
if is_new_representative {
    self.representative_iteration = Some((iter_idx, dps));
    self.pending_capture = true;
}

When a new representative is chosen, the next maybe_capture_representative snapshots its timeline (cast markers, damage markers, aura windows, cooldown windows) into the accumulator.

Picking the median-DPS iteration rather than the best or worst is a deliberate choice: a timeline you show a user should be typical, not a lucky outlier. The cost is that the captured timeline is whatever iteration happened to be closest at the moment it was chosen, which can shift as more iterations arrive, but since selection tracks the running mean, it converges to a genuinely representative run.

Statistics and convergence

The DPS statistics accumulate incrementally: each iteration adds to dps_sum and dps_sum_sq, from which the mean and standard deviation fall out cheaply without a second pass. The same step tracks a running mean to pick the representative iteration. The standard deviation feeds the adaptive early-exit in the chunk loop: the loop periodically computes the relative standard error of the mean and stops once it drops below the requested target_error, so a chunk runs exactly as many iterations as it needs for the requested precision and no more.

When chunks run in parallel (the CLI's multi-threaded runner), each thread builds its own accumulator and they are combined with merge. Merging sums the counts and the DPS sum and sum-of-squares directly, which combines the two distributions exactly since standard deviation is recovered from those sums, sums the per-spell and resource aggregates, adds the per-second bucket sums with a SIMD helper, and re-selects the representative against the combined mean. This is what lets a sim split across cores and still report one coherent distribution.

The protobuf encoding

The final step is encode, which consumes the accumulator and produces the ChunkTelemetry protobuf bytes. It builds an HDR histogram of the DPS values via the hdrhistogram crate, emits the per-spell action rows, the aura and resource and cooldown rows, the execution and damage-profile data, the per-second bucket sums, and the representative timeline snapshot, then serialises the whole thing with prost's encode_to_vec.

Two encoding details matter for fidelity. DPS and damage values are scaled by ten (PROTO_DPS_SCALE) and resource values by a hundred (PROTO_RESOURCE_SCALE) before being rounded into integers, so a decimal place of precision survives the integer wire format. And the histogram is HDR rather than a fixed-bin histogram, so it records the DPS distribution across its full range at consistent relative precision without committing to bucket boundaries up front.

Those bytes are the telemetry_bytes of a ChunkReport. From here the data leaves the engine entirely, decoded in the portal for charts, or merged across chunks by the orchestration and hosted-compute layers for a full-job result.

Next steps

Pets