evaluation/rollout

Rollout orchestration module.

This file will host the internal rollout orchestration entry while the public evaluation-level service remains a stable compatibility facade.

Educational note: A rollout is one deterministic episode for one policy under one seed. This module keeps that lifecycle readable: normalize inputs, create runtime state, simulate until termination, then fold the result into a public episode report.

That lifecycle matters because the trainer depends on rollouts being both repeatable and interpretable. A rollout is not only "did the bird crash?" It is the bridge between one seeded control problem and one scored episode that can be compared fairly with other genomes.

Rollout pipeline:

flowchart LR
    Options["network + rollout options"] --> Context["normalize context"]
    Context --> Runtime["create runtime state"]
    Runtime --> Loop["observe -> act -> step -> shape"]
    Loop --> EarlyStop{"done or\nbudget exhausted?"}
    EarlyStop -->|No| Loop
    EarlyStop -->|Yes| Finalize["finalize timeout state"]
    Finalize --> Result["compose FlappyEpisodeResult"]

evaluation/rollout/evaluation.rollout.service.ts

rolloutEpisode

rolloutEpisode(
  network: FlappyNetworkLike,
  rolloutOptions: FlappyRolloutOptions,
): FlappyEpisodeResult

Roll out an episode and return details.

Parameters:

Returns: Episode result details.

Example:

const result = rolloutEpisode(network, {
  seed: 123,
  normalizeFitness: true,
  maxFrames: 2_000,
});

console.log(result.fitness, result.doneReason);

evaluation/rollout/evaluation.rollout.services.ts

Rollout runtime services.

This file owns the mechanics of running an episode once a caller has decided to do a rollout: normalize the options, create the seeded runtime, loop over frames, and stop early when continued simulation is no longer informative.

The companion utils file owns reward shaping and result composition. This file owns the episode heartbeat itself.

Minimal usage sketch:

const rolloutEpisodeContext = resolveRolloutEpisodeContext(network, {
  seed: 123,
  enableEarlyTermination: true,
});
const rolloutEpisodeRuntimeState = createRolloutEpisodeRuntimeState(
  rolloutEpisodeContext,
);
runRolloutEpisodeLoop(
  network,
  rolloutEpisodeContext,
  rolloutEpisodeRuntimeState,
);
finalizeRolloutEpisodeState(
  rolloutEpisodeContext,
  rolloutEpisodeRuntimeState,
);

applyRolloutEarlyTerminationIfNeeded

applyRolloutEarlyTerminationIfNeeded(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
  currentObservationFeatures: SharedObservationFeatures,
): void

Applies the optional early-termination heuristic for unrecoverable starts.

Educational note: Early termination is an evaluation-speed heuristic, not a gameplay rule. It exists to stop obviously doomed warmup trajectories from consuming excessive rollout budget.

Parameters:

Returns: Nothing.

createRolloutEpisodeRuntimeState

createRolloutEpisodeRuntimeState(
  rolloutEpisodeContext: RolloutEpisodeContext,
): RolloutEpisodeRuntimeState

Creates mutable runtime state for one rollout episode.

The runtime state carries the seeded RNG, the mutable environment, the temporal observation memory, and the shaping counters accumulated during the episode.

Parameters:

Returns: Mutable runtime state.

finalizeRolloutEpisodeState

finalizeRolloutEpisodeState(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Finalizes episode state after the main rollout loop exits.

Timeouts are applied here instead of inside the loop body so natural episode endings stay distinct from budget exhaustion.

Parameters:

Returns: Nothing.

resolveRolloutEpisodeContext

resolveRolloutEpisodeContext(
  network: FlappyNetworkLike,
  rolloutOptions: FlappyRolloutOptions,
): RolloutEpisodeContext

Resolves normalized rollout configuration from user options.

This is the rollout safety boundary: caller-provided values are clamped into deterministic, execution-safe ranges before the main loop touches them.

Parameters:

Returns: Normalized rollout configuration.

resolveRolloutFrameFlapDecision

resolveRolloutFrameFlapDecision(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): boolean

Resolves the flap decision for one control substep and commits memory state.

The temporal memory is updated immediately after the decision so subsequent substeps can see short-term action history without needing recurrent state.

Parameters:

Returns: Whether the bird should flap.

runRolloutEpisodeFrame

runRolloutEpisodeFrame(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Runs one rollout frame including control, shaping, and early termination.

Educational note: Each frame follows a compact pipeline: observe, act, step the environment, accumulate shaping reward, then optionally prune the trajectory.

Parameters:

Returns: Nothing.

runRolloutEpisodeLoop

runRolloutEpisodeLoop(
  network: FlappyNetworkLike,
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): void

Runs the main rollout loop until termination or frame-budget exhaustion.

This is the episode heartbeat: keep stepping while the bird is alive and the rollout still has budget left.

Parameters:

Returns: Nothing.

evaluation/rollout/evaluation.rollout.utils.ts

Rollout shaping and result helpers.

This file interprets an episode after the runtime services have determined what happened. In other words: services produce the trajectory, utils assign meaning to that trajectory.

Educational note: The rollout subsystem separates simulation from scoring on purpose. The services file determines what happened; this file determines how that episode should be interpreted as fitness.

composeNormalizedFitness

composeNormalizedFitness(
  framesValue: number,
  pipesPassedValue: number,
  denseShapingValue: number,
  terminalShapingValue: number,
  maxFramesValue: number,
  pipeProgressTarget: number | undefined,
): number

Normalize and cap fitness channels so no single reward term dominates.

Educational note: Channel normalization is a pragmatic way to keep the objective balanced across episodes of different lengths and levels of progress.

Parameters:

Returns: Normalized composite fitness.

composeRolloutEpisodeResult

composeRolloutEpisodeResult(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
): FlappyEpisodeResult

Composes the final rollout result from the terminal game state.

This is the final fold step for rollout execution: internal counters and shaping channels become the public FlappyEpisodeResult consumed by training and reporting.

Parameters:

Returns: Episode result details.

computeDenseShapingReward

computeDenseShapingReward(
  previousFeatures: SharedObservationFeatures,
  currentFeatures: SharedObservationFeatures,
): number

Computes dense reward shaping from consecutive observations.

Dense shaping rewards incremental improvement throughout an episode instead of paying out only at the end, which gives evolution a more informative signal.

Parameters:

Returns: Per-step shaped reward.

computeTerminalShapingFitness

computeTerminalShapingFitness(
  episodeState: FlappyGameState,
  difficultyScale: number,
): number

Adds small terminal bonuses from final progress/alignment signals.

Terminal bonuses refine the final ranking, but they are intentionally smaller than the main survival and pipe-progress channels.

Parameters:

Returns: Terminal shaping reward.

isBirdLikelyUnrecoverable

isBirdLikelyUnrecoverable(
  observationFeatures: SharedObservationFeatures,
): boolean

Detects trajectories that are usually irrecoverable in early warmup.

The heuristic focuses on obvious early failures, where spending more rollout budget is least informative.

Parameters:

Returns: Whether the current trajectory appears unrecoverable.

resolveDenseShapingRewardComponents

resolveDenseShapingRewardComponents(
  previousFeatures: SharedObservationFeatures,
  currentFeatures: SharedObservationFeatures,
): DenseShapingRewardComponents

Resolves every dense-shaping reward component from consecutive observations.

If you want background reading, the Wikipedia article on "reward shaping" is a good high-level companion concept for why these components exist.

Parameters:

Returns: Dense-shaping reward components.

resolveRolloutFitnessBreakdown

resolveRolloutFitnessBreakdown(
  rolloutEpisodeContext: RolloutEpisodeContext,
  rolloutEpisodeRuntimeState: RolloutEpisodeRuntimeState,
  framesSurvived: number,
  pipesPassed: number,
): RolloutFitnessBreakdown

Resolves the raw fitness channels from the final episode state.

Separating raw channels from final composition makes reward rebalancing much easier to reason about.

Parameters:

Returns: Fitness-channel breakdown.

resolveUnnormalizedRolloutFitness

resolveUnnormalizedRolloutFitness(
  rolloutFitnessBreakdown: RolloutFitnessBreakdown,
): number

Resolves raw fitness by summing every fitness channel.

This is the legacy unnormalized objective. The normalized path below caps channels so no single term dominates the whole score.

Parameters:

Returns: Raw unnormalized fitness.

evaluation/rollout/evaluation.rollout.types.ts

Rollout-internal type contracts.

These runtime-only types are the private vocabulary of one rollout episode. They keep the public evaluation API compact while still giving the rollout loop explicit names for the data it carries between phases.

Read them as three layers:

DenseShapingRewardComponents

Per-frame dense shaping channels resolved from consecutive observations.

The shaping system rewards more than survival: it also tracks approach, centering, clearance, and stable motion.

RolloutEpisodeContext

Immutable rollout options normalized into execution-safe ranges.

Every field here is ready for direct use inside the episode loop.

RolloutEpisodeRuntimeState

Mutable runtime state accumulated while one rollout episode executes.

This is the mutable side of the rollout: world state, RNG, temporal memory, and the counters accumulated during execution.

RolloutFitnessBreakdown

Fitness-channel breakdown used to compose the public episode result.

Named channels make reward design easier to audit than a single opaque number.

evaluation/rollout/evaluation.rollout.constants.ts

Rollout-local constants.

These constants are the small semantic anchors that keep the rollout code readable: default ids, minimum clamps, zero baselines, and explicit done reasons.

Naming these sentinels explicitly keeps rollout code easier to read than a sea of raw 0, 1, and string literals.

FLAPPY_ROLLOUT_DEFAULT_GENOME_ID

Default genome id used when a network does not expose one.

FLAPPY_ROLLOUT_DONE_REASON_COLLISION

Rollout done reason used by heuristic early termination.

FLAPPY_ROLLOUT_DONE_REASON_TIMEOUT

Rollout done reason used when the episode exhausts its frame budget.

FLAPPY_ROLLOUT_MIN_EARLY_TERMINATION_CONSECUTIVE_FRAMES

Minimum unrecoverable-frame streak required for early termination.

FLAPPY_ROLLOUT_MIN_EARLY_TERMINATION_GRACE_FRAMES

Minimum grace period allowed before early termination can activate.

FLAPPY_ROLLOUT_MIN_MAX_FRAMES

Minimum positive frame-like scalar used by rollout normalization.

FLAPPY_ROLLOUT_ZERO_FITNESS

Shared zero baseline used across rollout fitness and counters.

This acts as the semantic baseline for both shaping accumulation and several rollout guard conditions.

Generated from source JSDoc • GitHub