evaluation/rollout

Rollout orchestration module.

This file will host the internal rollout orchestration entry while the public evaluation-level service remains a stable compatibility facade.

Educational note: A rollout is one deterministic episode for one policy under one seed. This module keeps that lifecycle readable: normalize inputs, create runtime state, simulate until termination, then fold the result into a public episode report.

That lifecycle matters because the trainer depends on rollouts being both repeatable and interpretable. A rollout is not only "did the bird crash?" It is the bridge between one seeded control problem and one scored episode that can be compared fairly with other genomes.

Rollout pipeline:

flowchart LR
    Options["network + rollout options"] --> Context["normalize context"]
    Context --> Runtime["create runtime state"]
    Runtime --> Loop["observe -> act -> step -> shape"]
    Loop --> EarlyStop{"done or\nbudget exhausted?"}
    EarlyStop -->|No| Loop
    EarlyStop -->|Yes| Finalize["finalize timeout state"]
    Finalize --> Result["compose FlappyEpisodeResult"]

evaluation/rollout/evaluation.rollout.service.ts

rolloutEpisode

rolloutEpisode(
  network: FlappyNetworkLike,
  rolloutOptions: FlappyRolloutOptions,
): FlappyEpisodeResult

Roll out an episode and return details.

Parameters:

Returns: Episode result details.

Example:

const result = rolloutEpisode(network, {
  seed: 123,
  normalizeFitness: true,
  maxFrames: 2_000,
});

console.log(result.fitness, result.doneReason);

rolloutEpisodeWithPredictor

rolloutEpisodeWithPredictor(
  options: { predict: (observationVector: number[]) => Promise<unknown>; rolloutOptions?: FlappyRolloutOptions | undefined; networkId?: number | undefined; },
): Promise<FlappyEpisodeResult>

Roll out an episode against one async predictor callback.

This browser-worker-oriented variant preserves the same seeded rollout and shaping semantics as rolloutEpisode(...) while sourcing control decisions from an async inference boundary such as InferenceChannel.predict(...).

Parameters:

Returns: Episode result details.

evaluation/rollout/evaluation.rollout.services.ts

Rollout runtime services.

This file owns the mechanics of running an episode once a caller has decided to do a rollout: normalize the options, create the seeded runtime, loop over frames, and stop early when continued simulation is no longer informative.

The companion utils file owns reward shaping and result composition. This file owns the episode heartbeat itself.

Minimal usage sketch:

const rolloutEpisodeContext = resolveRolloutEpisodeContext(network, {
  seed: 123,
  enableEarlyTermination: true,
});
const rolloutEpisodeRuntimeState = createRolloutEpisodeRuntimeState(
  rolloutEpisodeContext,
);
runRolloutEpisodeLoop(
  network,
  rolloutEpisodeContext,
  rolloutEpisodeRuntimeState,
);
finalizeRolloutEpisodeState(
  rolloutEpisodeContext,
  rolloutEpisodeRuntimeState,
);

applyRolloutEarlyTerminationIfNeeded

applyRolloutEarlyTerminationIfNeeded(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
  currentObservationFeatures: SharedObservationFeatures,
): void

Applies the optional early-termination heuristic for unrecoverable starts.

Educational note: Early termination is an evaluation-speed heuristic, not a gameplay rule. It exists to stop obviously doomed warmup trajectories from consuming excessive rollout budget.

Parameters:

Returns: Nothing.

createRolloutEpisodeRuntimeState

createRolloutEpisodeRuntimeState(
  rolloutEpisodeContext: RolloutEpisodeContext,
): RolloutEpisodeRuntimeState

Creates mutable runtime state for one rollout episode.

The runtime state carries the seeded RNG, the mutable environment, the shared observation-memory compatibility state, and the shaping counters accumulated during the episode.

Parameters:

Returns: Mutable runtime state.

finalizeRolloutEpisodeState

finalizeRolloutEpisodeState(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Finalizes episode state after the main rollout loop exits.

Timeouts are applied here instead of inside the loop body so natural episode endings stay distinct from budget exhaustion.

Parameters:

Returns: Nothing.

resolveRolloutEpisodeContext

resolveRolloutEpisodeContext(
  network: Pick<FlappyNetworkLike, "_id">,
  rolloutOptions: FlappyRolloutOptions,
): RolloutEpisodeContext

Resolves normalized rollout configuration from user options.

This is the rollout safety boundary: caller-provided values are clamped into deterministic, execution-safe ranges before the main loop touches them.

Parameters:

Returns: Normalized rollout configuration.

resolveRolloutFrameFlapDecision

resolveRolloutFrameFlapDecision(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): boolean

Resolves the flap decision for one control substep and commits memory state.

The shared memory surface is updated at the same post-decision boundary used by browser playback and worker simulation. The current controller input still reads only the current normalized frame, but keeping the bookkeeping point stable avoids runtime drift if an opt-in external-history experiment returns.

Parameters:

Returns: Whether the bird should flap.

resolveRolloutFrameFlapDecisionWithPredictor

resolveRolloutFrameFlapDecisionWithPredictor(
  predictOutputs: (observationVector: number[]) => Promise<unknown>,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): Promise<boolean>

Resolves one flap decision from an asynchronous predictor and commits memory.

The predictor path shares the same observation-vector construction and memory commit point as the direct network path, so recurrent evaluation stays stable across browser-worker and synchronous rollout surfaces.

Parameters:

Returns: Promise resolving to whether the bird should flap.

runRolloutEpisodeFrame

runRolloutEpisodeFrame(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Runs one rollout frame including control, shaping, and early termination.

Educational note: Each frame follows a compact pipeline: observe, act, step the environment, accumulate shaping reward, then optionally prune the trajectory.

Parameters:

Returns: Nothing.

runRolloutEpisodeFrameWithPredictor

runRolloutEpisodeFrameWithPredictor(
  predictOutputs: (observationVector: number[]) => Promise<unknown>,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): Promise<void>

Runs one rollout frame through an asynchronous predictor boundary.

This mirrors runRolloutEpisodeFrame(...) but awaits control output from a worker-hosted predictor before advancing the environment, which keeps channel and direct evaluation semantics aligned.

Parameters:

Returns: Promise resolved after the frame has advanced and shaping is updated.

runRolloutEpisodeLoop

runRolloutEpisodeLoop(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Runs the main rollout loop until termination or frame-budget exhaustion.

This is the episode heartbeat: keep stepping while the bird is alive and the rollout still has budget left.

Parameters:

Returns: Nothing.

runRolloutEpisodeLoopWithPredictor

runRolloutEpisodeLoopWithPredictor(
  predictOutputs: (observationVector: number[]) => Promise<unknown>,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): Promise<void>

Runs the main rollout loop against one async predictor callback.

This keeps the rollout semantics aligned with the synchronous evaluation surface while allowing the control decision itself to come from a persistent worker-hosted predictor.

Parameters:

Returns: Nothing.

shouldContinueRolloutEpisode

shouldContinueRolloutEpisode(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): boolean

Resolves whether the episode loop should advance another frame.

Besides the ordinary done and frame-budget guards, this helper owns the cooperative caller abort hook used by recurrent warm-start deadlines. When the hook fires, the rollout is marked as a timeout so callers can distinguish budget exhaustion from a gameplay collision.

Parameters:

Returns: True when the rollout should process another frame.

shouldStopRolloutEpisode

shouldStopRolloutEpisode(
  rolloutEpisodeContext: RolloutEpisodeContext,
): boolean

Invokes the optional cooperative abort hook for the current rollout.

Hook failures are treated as a stop request because the hook is a guardrail around optional warm-start work; a broken guard should yield back to normal NEAT evolution instead of trapping the worker in refinement.

Parameters:

Returns: True when the caller asks the rollout to stop.

evaluation/rollout/evaluation.rollout.utils.ts

Rollout shaping and result helpers.

This file interprets an episode after the runtime services have determined what happened. In other words: services produce the trajectory, utils assign meaning to that trajectory.

Educational note: The rollout subsystem separates simulation from scoring on purpose. The services file determines what happened; this file determines how that episode should be interpreted as fitness.

composeNormalizedFitness

composeNormalizedFitness(
  framesValue: number,
  pipesPassedValue: number,
  denseShapingValue: number,
  terminalShapingValue: number,
  maxFramesValue: number,
  pipeProgressTarget: number | undefined,
): number

Normalize and cap fitness channels so no single reward term dominates.

Educational note: Channel normalization is a pragmatic way to keep the objective balanced across episodes of different lengths and levels of progress.

Parameters:

Returns: Normalized composite fitness.

composeRolloutEpisodeResult

composeRolloutEpisodeResult(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): FlappyEpisodeResult

Composes the final rollout result from the terminal game state.

This is the final fold step for rollout execution: internal counters and shaping channels become the public FlappyEpisodeResult consumed by training and reporting.

Parameters:

Returns: Episode result details.

computeDenseShapingReward

computeDenseShapingReward(
  previousFeatures: SharedObservationFeatures,
  currentFeatures: SharedObservationFeatures,
): number

Computes dense reward shaping from consecutive observations.

Dense shaping rewards incremental improvement throughout an episode instead of paying out only at the end, which gives evolution a more informative signal.

Parameters:

Returns: Per-step shaped reward.

computeTerminalShapingFitness

computeTerminalShapingFitness(
  episodeState: FlappyGameState,
  difficultyScale: number,
): number

Adds small terminal bonuses from final progress/alignment signals.

Terminal bonuses refine the final ranking, but they are intentionally smaller than the main survival and pipe-progress channels.

Parameters:

Returns: Terminal shaping reward.

isBirdLikelyUnrecoverable

isBirdLikelyUnrecoverable(
  observationFeatures: SharedObservationFeatures,
): boolean

Detects trajectories that are usually irrecoverable in early warmup.

The heuristic focuses on obvious early failures, where spending more rollout budget is least informative.

Parameters:

Returns: Whether the current trajectory appears unrecoverable.

resolveDenseShapingRewardComponents

resolveDenseShapingRewardComponents(
  previousFeatures: SharedObservationFeatures,
  currentFeatures: SharedObservationFeatures,
): DenseShapingRewardComponents

Resolves every dense-shaping reward component from consecutive observations.

If you want background reading, the Wikipedia article on "reward shaping" is a good high-level companion concept for why these components exist.

Parameters:

Returns: Dense-shaping reward components.

resolveRolloutFitnessBreakdown

resolveRolloutFitnessBreakdown(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
  framesSurvived: number,
  pipesPassed: number,
): RolloutFitnessBreakdown

Resolves the raw fitness channels from the final episode state.

Separating raw channels from final composition makes reward rebalancing much easier to reason about.

Parameters:

Returns: Fitness-channel breakdown.

resolveUnnormalizedRolloutFitness

resolveUnnormalizedRolloutFitness(
  rolloutFitnessBreakdown: RolloutFitnessBreakdown,
): number

Resolves raw fitness by summing every fitness channel.

This is the legacy unnormalized objective. The normalized path below caps channels so no single term dominates the whole score.

Parameters:

Returns: Raw unnormalized fitness.

evaluation/rollout/evaluation.rollout.types.ts

Rollout-internal type contracts.

These runtime-only types are the private vocabulary of one rollout episode. They keep the public evaluation API compact while still giving the rollout loop explicit names for the data it carries between phases.

Read them as three layers:

DenseShapingRewardComponents

Per-frame dense shaping channels resolved from consecutive observations.

The shaping system rewards more than survival: it also tracks approach, centering, clearance, and stable motion.

RolloutEpisodeContext

Immutable rollout options normalized into execution-safe ranges.

Every field here is ready for direct use inside the episode loop.

RolloutEpisodeRuntimeState

Mutable runtime state accumulated while one rollout episode executes.

This is the mutable side of the rollout: world state, RNG, shared observation-memory compatibility state, and the counters accumulated during execution.

RolloutFitnessBreakdown

Fitness-channel breakdown used to compose the public episode result.

Named channels make reward design easier to audit than a single opaque number.

evaluation/rollout/evaluation.rollout.constants.ts

Rollout-local constants.

These constants are the small semantic anchors that keep the rollout code readable: default ids, minimum clamps, zero baselines, and explicit done reasons.

Naming these sentinels explicitly keeps rollout code easier to read than a sea of raw 0, 1, and string literals.

FLAPPY_ROLLOUT_DEFAULT_GENOME_ID

Default genome id used when a network does not expose one.

FLAPPY_ROLLOUT_DONE_REASON_COLLISION

Rollout done reason used by heuristic early termination.

FLAPPY_ROLLOUT_DONE_REASON_TIMEOUT

Rollout done reason used when the episode exhausts its frame budget.

FLAPPY_ROLLOUT_MIN_EARLY_TERMINATION_CONSECUTIVE_FRAMES

Minimum unrecoverable-frame streak required for early termination.

FLAPPY_ROLLOUT_MIN_EARLY_TERMINATION_GRACE_FRAMES

Minimum grace period allowed before early termination can activate.

FLAPPY_ROLLOUT_MIN_MAX_FRAMES

Minimum positive frame-like scalar used by rollout normalization.

FLAPPY_ROLLOUT_ZERO_FITNESS

Shared zero baseline used across rollout fitness and counters.

This acts as the semantic baseline for both shaping accumulation and several rollout guard conditions.

Generated from source JSDoc • GitHub