trainer

Node-facing entry shelf for the Flappy Bird trainer.

This file is the chapter opening for the trainer as a whole. If you want the fastest mental model for how the Flappy Bird training stack behaves, start here before reading the narrower type, planning, fitness, or reporting helpers.

The trainer exists to turn a generic NEAT controller into a fair, repeatable, Flappy-specific training program. That means the entry boundary has to do more than just call evolve(): it restores deterministic random state, installs staged population scoring, keeps shutdown cooperative, and hands each generation to a compact reporting pipeline that makes progress easy to inspect from the terminal.

This file does not own the scoring math, report formatting, or rollout mechanics directly. Its job is orchestration. Keeping that policy wiring in one place makes the trainer easier to reason about because the reader can see which responsibilities are static setup, which belong to the runtime loop, and which are delegated to specialized helpers.

A practical reading order is:

read this file to understand the runtime spine,
continue with trainer.types.ts to learn the shared nouns,
move to trainer.evaluation-plan.utils.ts for the staged curriculum and mutation schedule,
finish with trainer.fitness.service.ts and trainer.loop.service.ts to see how each generation is actually evaluated and advanced.

Read the rest of the trainer folder as supporting shelves beneath this entrypoint: types define the vocabulary, evaluation planning defines the budget and curriculum, fitness helpers define how populations are scored, and the loop turns all of that into a long-running evolutionary session.

Trainer startup map:

flowchart LR
    Entry["runTrainer()"] --> Setup["createTrainerSetup()\nstatic training shape"]
    Entry --> Runtime["createTrainerRuntimeState()\nmutable stop + latest report"]
    Entry --> Controller["createNeatController()\nbase NEAT runtime"]
    Controller --> Fitness["attachPopulationFitnessEvaluator()\nstaged population scoring"]
    Entry --> RNG["restoreRNGState()\ndeterministic run"]
    Entry --> Signals["registerTrainerStopSignals()\ncooperative shutdown"]
    Fitness --> Loop["runTrainerEvolutionLoop()\ngeneration heartbeat"]
    Signals --> Loop
    Loop --> Summary["logGenerationSummary()\ncompact terminal output"]

trainer/trainer.types.ts

Public type map for the Flappy Bird trainer boundary.

The trainer coordinates several moving pieces at once: a generic NEAT controller, staged rollout plans, compact longitudinal reporting, and a very small amount of mutable runtime state. Keeping those contracts together makes the trainer easier to read because the important nouns live in one place.

A useful reading order is:

FlappyTrainerSetup for static configuration.
FlappyTrainerNeatController for the runtime API shape.
FlappyGenerationEvaluationPlan for staged evaluation policy.
FlappyGenerationReport for what the loop emits back out.

Type relationship map:

flowchart TB
    Setup["FlappyTrainerSetup\nstatic sizes and counts"] --> Controller["FlappyTrainerNeatController\nmutable NEAT runtime"]
    Controller --> Plan["FlappyGenerationEvaluationPlan\nseeds, rollout budgets, annealing"]
    Plan --> Report["FlappyGenerationReport\ngeneration-level summary"]
    Runtime["FlappyTrainerRuntimeState\nstop intent + latest report"] --> Report
    Network["FlappyTrainerNetwork\nscore-carrying genome"] --> Controller
    Network --> Report
    ScoreEntry["ScoredGenomeEntry\nranking helper"] --> Network

FlappyGenerationEvaluationPlan

Generation-level rollout plans for staged evaluation.

Each generation resolves one plan that answers three questions: how strong is the current mutation schedule, which shared seeds belong to each stage, and what rollout budget each stage is allowed to spend. This is the contract that makes quick screening, full passes, and reevaluation feel like one coherent policy instead of three unrelated helper calls.

FlappyGenerationReport

Compact generation report used for training logs.

The report is shaped for longitudinal monitoring rather than raw storage. It collects the distribution and best-run details needed to judge whether a generation improved robustly instead of producing one lucky outlier.

FlappyTrainerNeatController

Local typed view for population-level fitness mode used by this trainer.

This is intentionally narrower than the full Neat runtime API. The trainer documents only the methods and mutable options it actually depends on, which makes the orchestration code read more like policy and less like framework plumbing.

FlappyTrainerNetwork

Network shape expected by the Flappy trainer.

The trainer only needs the evaluation-facing subset of a full network plus an optional score field used by staged ranking helpers. That narrow shape keeps the trainer decoupled from most of the broader network implementation.

FlappyTrainerRuntimeState

Trainer runtime state shared by orchestration helpers.

Only mutable cross-step values live here: stop intent and the most recent generation report. That tiny scope is deliberate because it keeps the rest of the trainer easy to reason about during long-running sessions.

FlappyTrainerSetup

Immutable trainer setup values.

These values define the static training shape before runtime state and staged evaluation are attached. Once created, the rest of the trainer can treat this as a stable configuration shelf rather than scattered ad hoc constants.

ScoredGenomeEntry

Score carrier used for deterministic ordering helpers.

Wrapping a genome together with its score makes ranking utilities easier to write and keeps score extraction explicit at sort time instead of letting it leak across multiple helpers.

trainer/trainer.ts

handleTrainerMainError

handleTrainerMainError(
  error: unknown,
): void

Handles fatal main rejection path.

The trainer keeps this boundary small so unexpected failures are formatted in one consistent place before reaching the CLI. That keeps shutdown behavior and terminal messaging consistent whether the failure came from setup, evaluation, or the loop itself.

Parameters:

error - Unknown rejection reason from trainer execution.

Returns: Nothing.

isDirectTrainerExecution

isDirectTrainerExecution(): boolean

Resolves whether this module is the direct Node entrypoint.

This lets the file behave as both a reusable module and a runnable script without duplicating the startup boundary in a second wrapper file.

Returns: true when Node launched this file directly.

runTrainer

runTrainer(): Promise<void>

Flappy Bird neuroevolution demo.

This script runs a small NEAT population where each genome controls a bird. The network sees the compact current-frame observation (6 floats) and outputs two competing action scores (no flap vs flap).

The function is intentionally orchestration-first. It answers one practical question: what has to be connected so a generic NEAT controller turns into a fair, repeatable, Flappy-specific trainer?

Educational note: The trainer is intentionally orchestration-first. It wires together setup, staged population evaluation, the outer evolution loop, graceful shutdown, and compact generation logging without burying those responsibilities inside a single monolithic file.

The mutation schedule gradually cools over early generations. If you want a conceptual parallel, the Wikipedia article on simulated annealing is a useful mental model for why early exploration is broader and later updates are more conservative.

Run (from repo root): npx ts-node examples/flappy_bird/trainFlappyBird.ts

Example:

await runTrainer();

trainer/trainer.errors.ts

Small CLI-facing error-rendering boundary for the trainer.

Keeping fatal-error formatting in one file prevents setup, evaluation, and shutdown paths from inventing slightly different terminal messages. That is a small detail, but it makes long-running scripts and quick debugging sessions easier to scan.

FLAPPY_TRAINER_UNEXPECTED_ERROR_PREFIX

Prefix used when rendering unexpected trainer failures to stderr.

formatTrainerErrorMessage

formatTrainerErrorMessage(
  error: unknown,
): string

Formats unknown trainer failures into a stable human-readable message.

Errors can arrive here as real Error objects or as arbitrary rejected values. Normalizing both cases into one predictable string keeps the CLI surface boring in the good way.

Parameters:

error - Unknown rejection reason from trainer execution.

Returns: Formatted error string for CLI logging.

trainer/trainer.constants.ts

Named knobs for trainer cadence, fairness, fallback behavior, and terminal reporting.

This file exists so the trainer reads like policy instead of a wall of magic numbers. When you want to tune how quickly evolution cools down, how much of the population survives into deeper stages, or how progress is summarized in logs, this is the first place to inspect.

Use the grouped reference below alongside the alphabetical symbol list that follows in the generated README.

Available trainer constant groups:

Group	What it controls	Representative constants
Population shape	Demo-scale NEAT size and elitism	`FLAPPY_TRAINER_DEFAULT_POPULATION_SIZE`, `FLAPPY_TRAINER_DEFAULT_ELITISM_COUNT`
Mutation cooling	How aggressive structural search stays over time	`FLAPPY_TRAINER_MUTATION_RATE_START`, `FLAPPY_TRAINER_MUTATION_RATE_END`
Stage budgets	How much rollout work each evaluation phase spends	`FLAPPY_TRAINER_QUICK_ROLLOUT_MAX_FRAMES`, `FLAPPY_TRAINER_FULL_ROLLOUT_PIPE_PROGRESS_TARGET`
Ranking heuristics	How provisional and robust scores are composed	`FLAPPY_TRAINER_FRAME_PRIMARY_BASE_SCORE`, `FLAPPY_TRAINER_PIPE_FALLBACK_PIPE_WEIGHT`
Reporting and fallback	Log formatting and defensive dummy-network paths	`FLAPPY_TRAINER_LOG_PARTS_DELIMITER`, `FLAPPY_TRAINER_DUMMY_NETWORK_ID`

Population and reproducibility:

Constant	Why it matters
`FLAPPY_TRAINER_DEFAULT_POPULATION_SIZE`	Sets the demo-scale population size for each evolutionary round.
`FLAPPY_TRAINER_DEFAULT_ELITISM_COUNT`	Preserves a stable top slice of genomes between generations.
`FLAPPY_TRAINER_DEFAULT_RNG_SEED`	Makes local runs reproducible across tuning sessions.

Mutation cooling:

Constant	Why it matters
`FLAPPY_TRAINER_MUTATION_ANNEAL_GENERATIONS`	Defines how long the trainer keeps cooling mutation pressure.
`FLAPPY_TRAINER_MUTATION_RATE_START`	Starting probability of mutation while search is still broad.
`FLAPPY_TRAINER_MUTATION_RATE_END`	Late-stage mutation probability after the trainer settles down.
`FLAPPY_TRAINER_MUTATION_AMOUNT_START`	Starting mutation count budget for exploratory generations.
`FLAPPY_TRAINER_MUTATION_AMOUNT_END`	Smaller late-stage mutation count for refinement.
`FLAPPY_TRAINER_NEAT_INITIAL_MUTATION_RATE`	Bootstrap controller mutation rate before schedule updates take over.
`FLAPPY_TRAINER_NEAT_INITIAL_MUTATION_AMOUNT`	Bootstrap controller mutation amount before schedule updates take over.

Stage budgets and selection depth:

Constant	Why it matters
`FLAPPY_TRAINER_QUICK_ROLLOUT_MAX_FRAMES`	Caps the cheap first-pass screen.
`FLAPPY_TRAINER_QUICK_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES`	Delays quick-stage early termination until a short grace window passes.
`FLAPPY_TRAINER_QUICK_ROLLOUT_EARLY_TERMINATION_CONSECUTIVE_FRAMES`	Requires a streak of bad frames before a quick-stage rollout is cut short.
`FLAPPY_TRAINER_QUICK_ROLLOUT_PIPE_PROGRESS_TARGET`	Normalizes quick-stage progress against a modest target.
`FLAPPY_TRAINER_FULL_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES`	Gives stronger candidates more recovery time in the deeper stage.
`FLAPPY_TRAINER_FULL_ROLLOUT_EARLY_TERMINATION_CONSECUTIVE_FRAMES`	Uses a stricter streak threshold before full-stage early termination.
`FLAPPY_TRAINER_FULL_ROLLOUT_PIPE_PROGRESS_TARGET`	Normalizes deeper-stage progress against a tougher target.
`FLAPPY_TRAINER_FULL_PASS_ELITISM_MULTIPLIER`	Sizes the full-pass candidate pool relative to elitism.
`FLAPPY_TRAINER_FULL_PASS_POPULATION_FRACTION`	Sizes the full-pass candidate pool relative to total population.
`FLAPPY_TRAINER_REEVALUATION_MIN_CANDIDATE_COUNT`	Guarantees that anti-luck reevaluation still compares a useful set of genomes.

Ranking, robustness, and fallback behavior:

Constant	Why it matters
`FLAPPY_TRAINER_FRAME_STABILITY_STDDEV_WEIGHT`	Penalizes unstable shared-seed performance.
`FLAPPY_TRAINER_PIPE_FILTER_TOLERANCE`	Gates frame-primary scoring to genomes close enough in pipe progress.
`FLAPPY_TRAINER_FRAME_PRIMARY_BASE_SCORE`	Creates a large score offset once a genome passes the gate.
`FLAPPY_TRAINER_FRAME_PRIMARY_SURVIVAL_WEIGHT`	Rewards longer stable survival inside the gated branch.
`FLAPPY_TRAINER_FRAME_PRIMARY_PIPE_WEIGHT`	Keeps pipe progress visible inside the gated branch.
`FLAPPY_TRAINER_PIPE_FALLBACK_PIPE_WEIGHT`	Emphasizes pipe progress when the primary gate is not met.
`FLAPPY_TRAINER_DUMMY_NETWORK_ID`	Provides a stable fallback network identifier for defensive reporting paths.
`FLAPPY_TRAINER_DUMMY_NO_FLAP_OUTPUT`	Encodes the dummy network's preferred passive action score.
`FLAPPY_TRAINER_DUMMY_FLAP_OUTPUT`	Encodes the dummy network's lower flap score for deterministic fallback behavior.

Reporting and terminal output:

Constant	Why it matters
`FLAPPY_TRAINER_SCORE_MEDIAN_PERCENTILE`	Names the percentile used for the reported median.
`FLAPPY_TRAINER_SCORE_P90_PERCENTILE`	Names the percentile used for the reported upper-tail score.
`FLAPPY_TRAINER_LOG_PARTS_DELIMITER`	Keeps compact generation logs consistently tokenized.
`FLAPPY_TRAINER_STOPPED_MESSAGE`	Gives graceful shutdown a stable terminal message.

FLAPPY_TRAINER_DEFAULT_ELITISM_COUNT

Number of elite genomes preserved unchanged each generation.

Preserving a small elite keeps the trainer from discarding clearly strong genomes while the rest of the population continues exploring.

FLAPPY_TRAINER_DEFAULT_POPULATION_SIZE

Default population size used by the Flappy trainer NEAT run.

The demo keeps this large enough for staged selection to matter while still remaining practical for local experimentation.

FLAPPY_TRAINER_DEFAULT_RNG_SEED

Deterministic trainer RNG seed used for reproducible training runs.

Reusing the shared Flappy example seed makes trainer behavior easier to compare across doc examples, tests, and manual tuning sessions.

FLAPPY_TRAINER_DUMMY_FLAP_OUTPUT

Dummy output channel value for the "flap" action score.

Keeping the flap score lower than the no-flap score produces a predictable never-flap dummy network for defensive report code paths.

FLAPPY_TRAINER_DUMMY_NETWORK_ID

ID used by dummy fallback network for defensive reporting paths.

The report helpers occasionally need a safe stand-in network so logging can stay total even when no real population data is available.

FLAPPY_TRAINER_DUMMY_NO_FLAP_OUTPUT

Dummy output channel value for the "no flap" action score.

The dummy network intentionally prefers the passive action so fallback report generation remains deterministic and simple.

FLAPPY_TRAINER_FRAME_PRIMARY_BASE_SCORE

Base offset awarded to genomes that satisfy the mean-pipe progress filter.

The large offset makes it obvious that surviving the gate is more important than tiny differences in the secondary frame-oriented terms.

FLAPPY_TRAINER_FRAME_PRIMARY_PIPE_WEIGHT

Pipe-progress contribution weight for frame-primary scoring.

This keeps pipe progress visible even inside the gated scoring branch so the ranking still prefers genuinely advancing policies.

FLAPPY_TRAINER_FRAME_PRIMARY_SURVIVAL_WEIGHT

Survival contribution weight for frame-primary scoring.

Once a genome passes the pipe-progress gate, extra survival time still matters because it often signals more stable control.

FLAPPY_TRAINER_FRAME_STABILITY_STDDEV_WEIGHT

Penalty multiplier applied to fitness standard deviation in frame-primary scoring.

Higher instability lowers the provisional score so a lucky but erratic genome is less likely to outrank a steadier competitor.

FLAPPY_TRAINER_FULL_PASS_ELITISM_MULTIPLIER

Multiplier over elitism used to size full-pass candidate pool.

This ties the full-pass budget to a familiar population concept so the deeper stage scales alongside the preserved elite.

FLAPPY_TRAINER_FULL_PASS_POPULATION_FRACTION

Population fraction used to size full-pass candidate pool.

The full stage uses the larger of this fraction and the elitism-based floor so promising mid-pack genomes are not excluded too aggressively.

FLAPPY_TRAINER_FULL_ROLLOUT_EARLY_TERMINATION_CONSECUTIVE_FRAMES

Consecutive unrecoverable frames needed to stop full rollout early.

This longer streak makes the full stage less trigger-happy than the quick screen while still avoiding wasted rollout budget.

FLAPPY_TRAINER_FULL_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES

Early-termination grace frames used during full rollout stage.

The deeper stage allows more time before judging a trajectory unrecoverable because the trainer is now evaluating stronger candidates more carefully.

FLAPPY_TRAINER_FULL_ROLLOUT_PIPE_PROGRESS_TARGET

Pipe-progress target used to normalize full and reevaluation rollout fitness.

The deeper stages share a tougher target because they are used for robust ranking rather than first-pass elimination.

FLAPPY_TRAINER_LOG_PARTS_DELIMITER

Delimiter used when composing compact generation log lines.

A single-space delimiter keeps the log dense, stable, and easy to parse by eye during long-running terminal sessions.

FLAPPY_TRAINER_MUTATION_AMOUNT_END

Final mutation amount reached after annealing window completes.

Cooling the mutation amount along with the rate reduces late-generation noise without fully freezing structural search.

FLAPPY_TRAINER_MUTATION_AMOUNT_START

Initial mutation amount at generation 0 before annealing.

This controls how many mutation operations can be applied while the trainer is still in its exploratory phase.

FLAPPY_TRAINER_MUTATION_ANNEAL_GENERATIONS

Generation count used to fully anneal mutation schedule from start to end values.

Within this window the trainer gradually cools from more exploratory updates toward smaller, steadier changes.

FLAPPY_TRAINER_MUTATION_RATE_END

Final mutation rate reached after annealing window completes.

Lower late-stage mutation pressure helps good policies stabilize instead of being reshuffled as aggressively as the opening generations.

FLAPPY_TRAINER_MUTATION_RATE_START

Initial mutation rate at generation 0 before annealing.

The starting rate is intentionally aggressive so the early population can discover useful topologies quickly.

FLAPPY_TRAINER_NEAT_INITIAL_MUTATION_AMOUNT

Initial NEAT mutation amount before generation schedule annealing is applied.

Matching the controller bootstrap to the trainer policy avoids a confusing mismatch between generation 0 and later loop behavior.

FLAPPY_TRAINER_NEAT_INITIAL_MUTATION_RATE

Initial NEAT mutation rate before generation schedule annealing is applied.

This seeds the controller with a sensible baseline before the per-generation planner starts taking over.

FLAPPY_TRAINER_PIPE_FALLBACK_PIPE_WEIGHT

Pipe-progress contribution weight for fallback scoring path.

The fallback path leans heavily on pipe progress because it is the clearest robust signal available before the primary gate is satisfied.

FLAPPY_TRAINER_PIPE_FILTER_TOLERANCE

Allowed mean-pipes delta from the current best before frame-primary scoring applies.

This acts like a gating tolerance: only genomes close enough in pipe progress get the more generous frame-primary score treatment.

FLAPPY_TRAINER_QUICK_ROLLOUT_EARLY_TERMINATION_CONSECUTIVE_FRAMES

Consecutive unrecoverable frames needed to stop quick screening rollout early.

Requiring a streak prevents one noisy frame from ending the screen too early while still saving time on clearly doomed trajectories.

FLAPPY_TRAINER_QUICK_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES

Early-termination grace frames used during quick screening rollout stage.

This short grace period gives a policy a brief chance to stabilize before the unrecoverable-flight heuristic is allowed to stop the rollout.

FLAPPY_TRAINER_QUICK_ROLLOUT_MAX_FRAMES

Frame cap used during quick screening rollout stage.

The quick stage is supposed to eliminate obviously weak genomes cheaply, so its horizon is intentionally shorter than the full evaluation horizon.

FLAPPY_TRAINER_QUICK_ROLLOUT_PIPE_PROGRESS_TARGET

Pipe-progress target used to normalize quick screening rollout fitness.

The lower quick-stage target reflects the fact that this pass is a screen, not the trainer's final statement of policy quality.

FLAPPY_TRAINER_RECURRENT_FULL_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES

Extended early-termination grace frames for recurrent profiles in full stage.

Full-stage evaluation uses a stricter but still profile-aware grace window so stateful networks are not penalized for the additional hidden-state warm-up cost they incur relative to feed-forward policies.

FLAPPY_TRAINER_RECURRENT_QUICK_ROLLOUT_EARLY_TERMINATION_GRACE_FRAMES

Extended early-termination grace frames for recurrent profiles in quick stage.

Recurrent architectures (NARX, GRU, LSTM) require more time to populate their hidden state before exhibiting coherent flight behavior. The standard 120-frame window terminates them before that warm-up completes. This extended window matches the longer effective response latency of stateful networks.

FLAPPY_TRAINER_REEVALUATION_MIN_CANDIDATE_COUNT

Minimum candidate count for reevaluation stage, regardless of elitism.

This prevents small elite settings from starving the anti-luck pass of enough genomes to produce a meaningful final comparison.

FLAPPY_TRAINER_SCORE_MEDIAN_PERCENTILE

Percentile used when reporting median population score.

Keeping the percentile explicit makes the report math self-documenting even for readers who skim the log formatter before the statistics helpers.

FLAPPY_TRAINER_SCORE_P90_PERCENTILE

Percentile used when reporting high-end population score (P90).

The trainer uses p90 as a quick "is the upper tail getting healthier?" signal without over-focusing on only the single best genome.

FLAPPY_TRAINER_STOPPED_MESSAGE

Log message emitted when trainer loop exits cleanly.

A dedicated constant keeps the shutdown path stable for humans and for any scripts that watch trainer output.

trainer/trainer.loop.service.ts

Outer generation heartbeat for the Flappy trainer.

This file owns the cadence of one generation after another. It deliberately avoids score math and rollout-planning detail so the top-level loop stays readable as: resolve schedule, evolve once, run a representative rollout, and emit a summary.

Loop sketch:

flowchart LR
    Resolve["resolveMutationSchedule()"] --> Apply["applyMutationSchedule()"]
    Apply --> Evolve["neatController.evolve()"]
    Evolve --> Fallback["rolloutEpisode()\nrepresentative fallback run"]
    Fallback --> Log["logGenerationSummary()"]
    Log --> Resolve

applyMutationSchedule

applyMutationSchedule(
  neatController: FlappyTrainerNeatController,
  mutationSchedule: FlappyMutationSchedule,
): void

Applies mutation schedule values to the NEAT controller options.

The schedule is resolved outside this helper so the loop can read as a clean "resolve -> apply -> evolve -> report" flow. That separation also makes it easier to inspect the active schedule in logs or tests.

Parameters:

neatController - Trainer NEAT controller.
mutationSchedule - Mutation schedule for current generation.

Returns: Nothing.

LogGenerationSummaryCallback

LogGenerationSummaryCallback(
  generationLabel: number,
  mutationSchedule: FlappyMutationSchedule,
  report: FlappyGenerationReport | undefined,
  fittestGenome: FlappyTrainerNetwork,
  fallbackEpisode: FlappyEpisodeResult,
): void

Callback signature for one-line generation logging.

The loop owns evolution cadence, while the callback owns presentation. Keeping those concerns separate makes it easy to reuse the loop with richer reporting later.

runTrainerEvolutionLoop

runTrainerEvolutionLoop(
  neatController: FlappyTrainerNeatController,
  trainerRuntimeState: FlappyTrainerRuntimeState,
  logGenerationSummary: LogGenerationSummaryCallback,
): Promise<void>

Runs the outer evolution loop until runtime stop is requested.

Educational note: This is the trainer's main heartbeat: resolve the current mutation schedule, evolve one generation, run a representative fallback rollout for logging, and emit a compact summary.

Parameters:

neatController - Trainer NEAT controller.
trainerRuntimeState - Mutable trainer runtime state.
logGenerationSummary - Callback that emits compact generation logs.

Returns: Promise resolved when the trainer has been stopped.

trainer/trainer.setup.service.ts

Bootstrap helpers for the trainer's static shape and initial NEAT runtime.

The trainer keeps setup separate from the main entry file so configuration and controller construction can be read, tested, and tuned without also reading the outer loop. This boundary answers a simple question: what does the demo need before the first generation can run?

createNeatController

createNeatController(
  trainerSetup: FlappyTrainerSetup,
): FlappyTrainerNeatController

Builds the NEAT controller with baseline options.

Educational note: The trainer enables population-level fitness mode because the quality of a Flappy policy depends on fair comparison across shared seed batches, not on a one-network-at-a-time scoring callback.

Parameters:

trainerSetup - Immutable trainer setup values.

Returns: Typed NEAT controller used by the trainer loop.

Example:

const trainerSetup = createTrainerSetup();
const neatController = createNeatController(trainerSetup);

createTrainerRuntimeState

createTrainerRuntimeState(): FlappyTrainerRuntimeState

Creates mutable runtime state container.

The runtime state is intentionally tiny. It only tracks stop intent and the latest report so the outer loop can remain easy to reason about.

Returns: Fresh runtime state used by loop orchestration.

createTrainerSetup

createTrainerSetup(): FlappyTrainerSetup

Creates immutable setup values for the trainer.

Educational note: The setup object freezes the core training shape up front: input width, output width, population size, and elitism count. Centralizing those values makes the rest of the trainer read as policy rather than configuration noise.

Returns: Default trainer setup values used for NEAT configuration.

resolveNoopFitness

resolveNoopFitness(): number

Trivial baseline fitness used before attaching population evaluator.

This placeholder keeps controller construction simple. The real staged evaluator is attached immediately afterward by the fitness service, so this function exists only to satisfy the generic controller's constructor contract.

Returns: Constant zero fitness.

trainer/trainer.report.service.ts

Generation-reporting facade for the Flappy trainer.

The trainer deliberately logs more than one champion score. This file turns a finished generation into a compact distribution summary so humans can tell the difference between broad improvement and a single lucky genome.

buildGenerationReport

buildGenerationReport(
  population: readonly FlappyTrainerNetwork[],
  aggregateByGenome: ReadonlyMap<FlappyTrainerNetwork, FlappySeedBatchEvaluation>,
  generationEvaluationPlan: FlappyGenerationEvaluationPlan,
): FlappyGenerationReport

Builds a compact report for the current generation.

Educational note: The trainer logs more than a single best score because single-number progress can hide instability. Mean, median, $p90$, and standard deviation reveal whether a generation is broadly improving or whether one lucky genome is masking a weak population.

Parameters:

population - Current population.
aggregateByGenome - Aggregate evaluation results keyed by genome.
generationEvaluationPlan - Per-generation staged evaluation plan.

Returns: Aggregated generation report.

logGenerationSummary

logGenerationSummary(
  generationLabel: number,
  mutationSchedule: FlappyMutationSchedule,
  report: FlappyGenerationReport | undefined,
  fittestGenome: FlappyTrainerNetwork,
  fallbackEpisode: FlappyEpisodeResult,
): void

Emits one compact generation log line.

The emitted line is designed for long-running terminal sessions: dense enough to be useful, but stable enough that humans can visually scan progress over hundreds of generations. The goal is not pretty output. The goal is a line that lets you spot drift, plateaus, and sudden regressions at a glance.

Parameters:

generationLabel - Current generation label.
mutationSchedule - Active mutation schedule.
report - Optional aggregated generation report.
fittestGenome - Fittest genome returned by the NEAT controller.
fallbackEpisode - Fallback representative rollout episode.

Returns: Nothing.

trainer/trainer.fitness.service.ts

Adapter layer that teaches a generic NEAT controller how to score an entire Flappy population fairly.

A plain controller only knows that it needs a fitness callback. This service turns that loose contract into the trainer's staged policy: screen everybody cheaply, spend more budget on the survivors, then reevaluate the finalists so ranking is less sensitive to luck.

Fitness orchestration map:

flowchart LR
    Population["population"] --> Plan["resolveGenerationEvaluationPlan()"]
    Plan --> Quick["evaluatePopulationQuickStage()"]
    Quick --> Full["evaluatePopulationFullStage()"]
    Full --> Reeval["evaluatePopulationReevaluationStage()"]
    Reeval --> Commit["commitPopulationScores()"]
    Commit --> Report["buildGenerationReport()"]

attachPopulationFitnessEvaluator

attachPopulationFitnessEvaluator(
  neatController: FlappyTrainerNeatController,
  trainerRuntimeState: FlappyTrainerRuntimeState,
  elitismCount: number,
  dependencies: TrainerFitnessServiceDependencies,
): void

Attaches population-level staged evaluator to the NEAT controller.

This is the moment where the generic NEAT controller becomes a Flappy-specific trainer: a plain controller receives the staged population evaluator that understands shared-seed screening, full-pass scoring, and reevaluation.

Parameters:

neatController - Trainer NEAT controller.
trainerRuntimeState - Mutable trainer runtime state.
elitismCount - Number of elite genomes preserved each generation.
dependencies - Pure/impure helper callbacks used by the evaluator.

Returns: Nothing.

createPopulationFitnessEvaluator

createPopulationFitnessEvaluator(
  neatController: FlappyTrainerNeatController,
  trainerRuntimeState: FlappyTrainerRuntimeState,
  elitismCount: number,
  dependencies: TrainerFitnessServiceDependencies,
): (population: FlappyTrainerNetwork[]) => Promise<void>

Creates the asynchronous population fitness evaluator.

Educational note: The trainer uses staged evaluation to reduce luck. Genomes are first screened quickly, then the most promising ones receive more expensive evaluation, and the best candidates are reevaluated again for robustness.

That strategy is closer to tournament design than to naive one-shot scoring: the same generation budget is spent unevenly so weak genomes are filtered out early and strong genomes are compared more carefully.

Parameters:

neatController - Trainer NEAT controller.
trainerRuntimeState - Mutable trainer runtime state.
elitismCount - Number of elite genomes preserved each generation.
dependencies - Pure/impure helper callbacks used by the evaluator.

Returns: Evaluator callback assigned to neatController.fitness.

TrainerFitnessServiceDependencies

Callback dependencies required by the trainer fitness orchestration service.

Educational note: The trainer evaluates whole populations in staged passes. This dependency bag keeps the top-level service declarative and makes each stage independently replaceable without rewriting the orchestration logic.

trainer/trainer.signals.service.ts

Process-signal bridge for cooperative trainer shutdown.

The trainer should stop between generations, not by tearing the process down in the middle of evaluation. This file converts OS-level stop signals into one shared runtime intent flag that the main loop can observe safely.

handleTrainerStopSignal

handleTrainerStopSignal(
  trainerRuntimeState: FlappyTrainerRuntimeState,
): void

Handles one stop signal update.

The handler does the minimum possible work because signal paths should stay predictable and side-effect light.

Parameters:

trainerRuntimeState - Mutable trainer runtime state.

Returns: Nothing.

registerTrainerStopSignals

registerTrainerStopSignals(
  trainerRuntimeState: FlappyTrainerRuntimeState,
): void

Registers graceful stop signal handlers.

Educational note: Long-running evolutionary runs should stop cleanly when the user presses Ctrl+C. This service flips runtime intent instead of abruptly tearing down the process mid-generation.

Parameters:

trainerRuntimeState - Mutable trainer runtime state.

Returns: Nothing.

trainer/trainer.evaluation.service.ts

Trainer evaluation compatibility facade.

The staged population-evaluation implementation now lives in the dedicated trainer/evaluation/ submodule so orchestration, scoring helpers, internal contracts, and sub-services can evolve behind a focused boundary.

Use this file when you want the public trainer-level shelf for staged evaluation without learning the internal subfolder layout first.

Staged evaluation ladder:

flowchart LR
    Population["population"] --> Quick["quick stage\ncheap shared-seed screen"]
    Quick --> Full["full stage\nlonger comparison for survivors"]
    Full --> Reeval["reevaluation stage\nlarger anti-luck batch"]
    Reeval --> Commit["commitPopulationScores()\nwrite final provisional scores"]

commitPopulationScores

commitPopulationScores(
  population: readonly FlappyTrainerNetwork[],
  provisionalScoresByGenome: ReadonlyMap<FlappyTrainerNetwork, number>,
): void

Commits provisional scores to genome score fields.

Provisional scores are kept in a map during staging so each phase can refresh them without mutating the genomes too early. This helper performs the final write-back once staged evaluation is complete.

Parameters:

population - Current population.
provisionalScoresByGenome - Final provisional score map.

Returns: Nothing.

evaluatePopulationFullStage

evaluatePopulationFullStage(
  population: readonly FlappyTrainerNetwork[],
  generationEvaluationPlan: FlappyGenerationEvaluationPlan,
  aggregateByGenome: Map<FlappyTrainerNetwork, FlappySeedBatchEvaluation>,
  provisionalScoresByGenome: Map<FlappyTrainerNetwork, number>,
  elitismCount: number,
  populationStageEvaluationDependencies: PopulationStageEvaluationDependencies,
): Promise<void>

Executes the full evaluation stage over the top provisional candidates.

This is the middle-cost stage in the ranking ladder: not every genome survives into it, but the survivors receive a more trustworthy estimate than the quick screen alone can provide.

Parameters:

population - Current population.
generationEvaluationPlan - Per-generation staged evaluation plan.
aggregateByGenome - Mutable aggregate cache keyed by genome.
provisionalScoresByGenome - Mutable provisional score map.
elitismCount - Configured elitism count.

Returns: Nothing.

evaluatePopulationQuickStage

evaluatePopulationQuickStage(
  population: readonly FlappyTrainerNetwork[],
  generationEvaluationPlan: FlappyGenerationEvaluationPlan,
  aggregateByGenome: Map<FlappyTrainerNetwork, FlappySeedBatchEvaluation>,
  provisionalScoresByGenome: Map<FlappyTrainerNetwork, number>,
  populationStageEvaluationDependencies: PopulationStageEvaluationDependencies,
): Promise<void>

Executes the quick evaluation stage over the full population.

Educational note: The quick stage is a cheap screening pass. Every genome is tested on the same small shared seed batch so the trainer can discard obviously weak candidates before spending more rollout budget on them.

Parameters:

population - Current population.
generationEvaluationPlan - Per-generation staged evaluation plan.
aggregateByGenome - Mutable aggregate cache keyed by genome.
provisionalScoresByGenome - Mutable provisional score map.

Returns: Nothing.

Example:

evaluatePopulationQuickStage(
  population,
  generationEvaluationPlan,
  aggregateByGenome,
  provisionalScoresByGenome,
);

evaluatePopulationReevaluationStage

evaluatePopulationReevaluationStage(
  population: readonly FlappyTrainerNetwork[],
  generationEvaluationPlan: FlappyGenerationEvaluationPlan,
  aggregateByGenome: Map<FlappyTrainerNetwork, FlappySeedBatchEvaluation>,
  provisionalScoresByGenome: Map<FlappyTrainerNetwork, number>,
  elitismCount: number,
  populationStageEvaluationDependencies: PopulationStageEvaluationDependencies,
): Promise<void>

Executes the large-seed reevaluation stage over top candidates.

Educational note: Reevaluation is the trainer's anti-luck pass. The best provisional genomes are tested again on a larger shared seed batch so leaderboard positions are less sensitive to a fortunate early sample.

Parameters:

population - Current population.
generationEvaluationPlan - Per-generation staged evaluation plan.
aggregateByGenome - Mutable aggregate cache keyed by genome.
provisionalScoresByGenome - Mutable provisional score map.
elitismCount - Configured elitism count.

Returns: Nothing.

trainer/trainer.report.service.services.ts

Small report-side helpers that keep the report facade orchestration-first.

These helpers do the awkward work that a summary builder should not have to read inline: filtering non-finite scores, reusing cached aggregates when they exist, and falling back to a deterministic dummy network when a report still needs to be total in edge cases.

collectFiniteGenomeScores

collectFiniteGenomeScores(
  population: readonly FlappyTrainerNetwork[],
): number[]

Collects only finite scores from the current population.

Unevaluated or invalid scores are intentionally skipped so percentile and standard deviation calculations operate on stable numeric inputs only.

Parameters:

population - Current population.

Returns: Finite scores in population order.

Example:

const scores = collectFiniteGenomeScores(population);

resolveBestGenerationDetails

resolveBestGenerationDetails(
  population: readonly FlappyTrainerNetwork[],
  bestGenome: FlappyTrainerNetwork | undefined,
  aggregateByGenome: ReadonlyMap<FlappyTrainerNetwork, FlappySeedBatchEvaluation>,
  fallbackSeeds: readonly number[],
  fallbackRolloutOptions: FlappyRolloutOptions,
): ResolvedBestGenerationDetails

Resolves cached or fallback best-of-generation details for reporting.

The report layer needs both aggregate seed statistics and one representative episode. This helper centralizes the fallback rules so the service facade can remain a thin orchestration layer.

Parameters:

population - Current population.
bestGenome - Genome selected as generation best.
aggregateByGenome - Cached aggregate evaluations keyed by genome.
fallbackSeeds - Seeds used when the aggregate must be recomputed.
fallbackRolloutOptions - Rollout options for fallback evaluation.

Returns: Aggregate metrics and a representative best-genome episode.

Example:

const { bestAggregate, bestEpisode } = resolveBestGenerationDetails(
  population,
  bestGenome,
  aggregateByGenome,
  reevaluationSeeds,
  reevaluationRolloutOptions,
);

ResolvedBestGenerationDetails

Aggregate and representative rollout resolved for the best genome.

Keeping these values together lets the report facade stay focused on orchestration while this helper module owns cache fallback behavior and the "best summary plus one representative episode" pairing.

trainer/trainer.reporting.utils.ts

Formatting helpers for compact generation-log output.

The trainer's console line is intentionally tokenized rather than narrated. This file keeps that token order stable so humans can build scanning habits across long runs and tools can parse the same line shape later if needed.

buildGenerationLogParts

buildGenerationLogParts(
  generationLabel: number,
  bestFitness: number,
  bestPipesPassed: number,
  bestFramesSurvived: number,
  report: FlappyGenerationReport | undefined,
  mutationSchedule: FlappyMutationSchedule,
): string[]

Builds one-line generation log tokens.

The chosen order moves from identity (gen) to quality (best, mean, median, p90, std) and then into operational context (difficulty, mutation, seed counts).

Parameters:

generationLabel - Generation label shown in logs.
bestFitness - Best resolved fitness value for this generation.
bestPipesPassed - Best resolved pipes passed value.
bestFramesSurvived - Best resolved frames survived value.
report - Optional aggregated generation report.
mutationSchedule - Active mutation schedule for this generation.

Returns: Ordered log tokens for compact console output.

trainer/trainer.selection.utils.ts

Deterministic ranking helpers for trainer populations.

These utilities keep score extraction and descending-order selection in one place so the staged evaluation services do not each reinvent the same sorting logic with slightly different fallback rules.

resolveBestGenomeByScore

resolveBestGenomeByScore(
  population: readonly FlappyTrainerNetwork[],
): FlappyTrainerNetwork | undefined

Resolves the best genome by current score.

This helper is intentionally tiny, but it gives the rest of the trainer a single vocabulary term for "the current best genome under whatever score shelf is currently populated."

Parameters:

population - Current trainer population.

Returns: Highest-scoring genome or undefined when population is empty.

selectTopGenomesByScore

selectTopGenomesByScore(
  population: readonly FlappyTrainerNetwork[],
  provisionalScoresByGenome: ReadonlyMap<FlappyTrainerNetwork, number>,
  targetCount: number,
): FlappyTrainerNetwork[]

Returns top genomes ordered by current provisional score.

Parameters:

population - Current trainer population.
provisionalScoresByGenome - Optional map of staged provisional scores.
targetCount - Maximum number of genomes to return.

Returns: Highest-scoring genomes in descending score order.

trainer/trainer.evaluation-plan.utils.ts

Generation-planning helpers for staged evaluation, curriculum difficulty, and mutation cooling.

The trainer does not decide stage budgets ad hoc inside the evolution loop. Instead, each generation resolves one explicit plan that says which shared seeds to use, how hard the environment should currently be, and how much mutation pressure should remain.

Generation planning map:

flowchart TB
    Generation["generationIndex"] --> Mutation["resolveMutationSchedule()\nrate + amount"]
    Generation --> Difficulty["resolveCurriculumDifficultyScale()\ncourse difficulty"]
    Generation --> QuickSeeds["quick shared seeds"]
    Generation --> FullSeeds["full-stage shared seeds"]
    Generation --> ReevalSeeds["reevaluation shared seeds"]
    Difficulty --> QuickOptions["createQuickRolloutOptions()"]
    Difficulty --> FullOptions["createFullRolloutOptions()"]
    Difficulty --> ReevalOptions["createReevaluationRolloutOptions()"]
    Mutation --> Plan["FlappyGenerationEvaluationPlan"]
    QuickSeeds --> Plan
    FullSeeds --> Plan
    ReevalSeeds --> Plan
    QuickOptions --> Plan
    FullOptions --> Plan
    ReevalOptions --> Plan

buildSharedSeedBatch

buildSharedSeedBatch(
  generationIndex: number,
  stageSalt: number,
  seedCount: number,
): number[]

Build deterministic shared seeds for one generation stage.

Shared seeds are what make same-generation comparisons fair: genomes face the same sampled worlds instead of winning because they happened to get a kinder random rollout.

Parameters:

generationIndex - Zero-based generation index.
stageSalt - Constant stage-specific salt.
seedCount - Number of seeds to produce.

Returns: Deterministic shared seed list.

createFullRolloutOptions

createFullRolloutOptions(
  difficultyScale: number,
  isRecurrent: boolean,
): FlappyRolloutOptions

Builds full-stage rollout options.

This stage gives stronger candidates a longer, stricter test so the trainer can refine the leaderboard before committing to expensive reevaluation.

Parameters:

difficultyScale - Difficulty scale for this generation.

Returns: Full stage rollout options.

createQuickRolloutOptions

createQuickRolloutOptions(
  difficultyScale: number,
  isRecurrent: boolean,
): FlappyRolloutOptions

Builds quick-screen rollout options.

The quick stage is a cheap gate. It favors speed and comparability over fully trusted estimates because weak genomes only need enough evidence to be ruled out early.

Parameters:

difficultyScale - Difficulty scale for this generation.

Returns: Quick stage rollout options.

createReevaluationRolloutOptions

createReevaluationRolloutOptions(
  difficultyScale: number,
): FlappyRolloutOptions

Builds high-confidence reevaluation rollout options.

Reevaluation deliberately disables early termination so the strongest candidates are judged on a more faithful, less shortcut-heavy comparison.

Parameters:

difficultyScale - Difficulty scale for this generation.

Returns: Reevaluation stage rollout options.

FlappyMutationSchedule

Mutation schedule used by generation planning and outer loop logging.

These two numbers are treated as a single policy decision because the trainer cools both the frequency and the size of mutations together.

mixSeed

mixSeed(
  generationIndex: number,
  stageSalt: number,
): number

Mixes generation and stage salts into a deterministic uint32 RNG seed.

The small mixing pipeline spreads nearby generation numbers apart so adjacent stages and generations do not accidentally reuse overly correlated seed sets.

Parameters:

generationIndex - Current generation index.
stageSalt - Stage-specific salt.

Returns: Mixed uint32 seed.

resolveCurriculumDifficultyScale

resolveCurriculumDifficultyScale(
  generationIndex: number,
): number

Resolve curriculum difficulty scale for the current generation.

The course starts gentle, ramps through the middle generations, and then caps at full difficulty once the population has had time to discover viable flight.

Parameters:

generationIndex - Zero-based generation index.

Returns: Difficulty scale in [0, 1].

resolveGenerationEvaluationPlan

resolveGenerationEvaluationPlan(
  generationIndex: number,
  isRecurrent: boolean,
): FlappyGenerationEvaluationPlan

Resolves all per-generation evaluation controls.

Think of this as the trainer's "generation contract": the rest of the system can ask for one object and receive a fully prepared set of seeds, rollout options, and annealed mutation values.

Parameters:

generationIndex - Zero-based generation index.

Returns: Full staged evaluation plan for the generation.

resolveMutationSchedule

resolveMutationSchedule(
  generationIndex: number,
): FlappyMutationSchedule

Resolve a smooth mutation annealing schedule.

Early generations mutate more aggressively so the population can search the space broadly. Later generations cool down so the trainer can refine useful structures rather than constantly replacing them.

Parameters:

generationIndex - Zero-based generation index.

Returns: Mutation rate and mutation amount for this generation.