environment

Canonical world-state contract for one Flappy Bird episode.

This file is the environment chapter's conceptual starting point because it names the pieces that every other environment helper manipulates: the bird, the pipe field, the episode clock, and the termination reason.

Read this boundary as the simulation's truth surface. Evaluation uses it to score policies, the trainer uses it to compare genomes fairly, and browser playback-related tooling depends on it staying compact and deterministic.

The important design choice is restraint. The environment keeps only the state needed to advance one episode correctly. It does not store DOM-facing data, worker transport payloads, or trainer policy metadata. That smaller contract is what makes deterministic stepping and reward debugging practical.

environment/environment.types.ts

FlappyBird

Bird kinematic state for one simulation frame.

The environment keeps only the minimum physics state needed to advance the episode: vertical position and vertical velocity.

FlappyDifficultyScale

Difficulty scale used by the curriculum scheduler.

0 means easiest profile (wide gaps, slower pipes).
1 means fully adaptive profile based on passed pipes.

Values between 0 and 1 interpolate between those extremes, which lets the trainer or environment caller dial curriculum strength continuously.

FlappyGameState

Full simulation state for one Flappy episode.

Educational note: This is the canonical single-episode world state used by evaluation and some trainer-facing helpers. It is intentionally compact so stepping the world is deterministic and easy to inspect.

FlappyObservationFeatures

Structured observation features used to build the neural-network input vector.

Re-exported from shared simulation utilities so trainer and browser paths stay synchronized as the observation schema evolves.

FlappyPipe

Pipe obstacle definition.

Pipes move from right to left. The bird scores once per pipe when the pipe completely crosses the bird x-position.

This is the environment-owned pipe state, distinct from the packed snapshot transport shapes used by the browser worker.

environment/environment.constants.ts

FLAPPY_ENVIRONMENT_DEFAULT_CONTROL_SUBSTEPS_PER_FRAME

Default number of control/physics substeps executed per simulation frame.

Reusing the shared control-substep count keeps the environment and browser playback aligned on the same stepping granularity.

FLAPPY_ENVIRONMENT_DEFAULT_DIFFICULTY_SCALE

Default curriculum difficulty scale used by environment stepping.

A value of 1 means the environment uses the full adaptive difficulty ramp.

FLAPPY_ENVIRONMENT_MAX_FRAMES_PER_EPISODE

Maximum frame budget before the environment forces timeout termination.

Timeouts stop extremely long survival loops from dominating evaluation cost.

environment/environment.state.service.ts

createInitialFlappyState

createInitialFlappyState(
  rng: FlappyRng,
): FlappyGameState

Create a fresh Flappy Bird episode state.

Educational note: A new episode starts with one initial pipe already materialized so the first observation is meaningful immediately. That avoids a cold-start phase where a policy would receive mostly empty-space inputs.

Parameters:

rng - Random source used to generate initial pipe configuration.

Returns: Initial state for one deterministic rollout.

Example:

const state = createInitialFlappyState(rng);

environment/environment.step.service.ts

stepFlappyState

stepFlappyState(
  state: FlappyGameState,
  rng: FlappyRng,
  flap: boolean,
  difficultyScale: number,
): void

Advance the simulation by one frame.

This is the simplest stepping surface: one logical frame and one flap choice. More advanced callers can use the control-substep variant below.

Parameters:

state - Mutable state object to update in-place.
rng - Random source used to spawn pipes.
flap - If true, applies an upward velocity impulse.
difficultyScale - Curriculum difficulty scale in [0, 1].

Returns: Nothing.

stepFlappyStateWithControlSubsteps

stepFlappyStateWithControlSubsteps(
  state: FlappyGameState,
  rng: FlappyRng,
  shouldFlapForSubstep: () => boolean,
  difficultyScale: number,
  controlSubstepsPerFrame: number,
): void

Advance one logical frame using multiple control/physics substeps.

This allows policies to react multiple times before frameIndex advances, improving responsiveness in high-difficulty scenarios.

Educational note: Splitting a logical frame into smaller control steps is a simple numerical stability trick. It reduces the chance that fast pipes or large velocity updates make the environment feel artificially coarse.

For background reading, the Wikipedia article on "numerical integration" provides the general idea behind updating continuous motion in small steps.

Parameters:

state - Mutable state object to update in-place.
rng - Random source used to spawn pipes.
shouldFlapForSubstep - Callback deciding flap action per substep.
difficultyScale - Curriculum difficulty scale in [0, 1].
controlSubstepsPerFrame - Number of substeps to run this frame.

Returns: Nothing.

stepFlappyStateWithControlSubstepsAsync

stepFlappyStateWithControlSubstepsAsync(
  state: FlappyGameState,
  rng: FlappyRng,
  shouldFlapForSubstep: () => boolean | Promise<boolean>,
  difficultyScale: number,
  controlSubstepsPerFrame: number,
): Promise<void>

Advance one logical frame using multiple control/physics substeps with async control.

This variant keeps the environment stepping contract identical to the sync path while allowing the control decision to come from an async boundary such as a persistent worker-hosted inference channel.

Parameters:

state - Mutable state object to update in-place.
rng - Random source used to spawn pipes.
shouldFlapForSubstep - Async callback deciding flap action per substep.
difficultyScale - Curriculum difficulty scale in [0, 1].
controlSubstepsPerFrame - Number of substeps to run this frame.

Returns: Nothing.

environment/environment.collision.utils.ts

updateCollisionAndProgressState

updateCollisionAndProgressState(
  state: FlappyGameState,
): void

Apply out-of-bounds, pipe-collision, and pass-credit rules for one substep.

Educational note: Collision resolution and progress credit live together because both depend on the same bird-vs-pipe geometry for the current substep. Keeping them in one place helps the environment avoid inconsistent "passed but also collided" edge cases.

Parameters:

state - Mutable simulation state to update in-place.

Returns: Nothing.

environment/environment.observation.utils.ts

getFlappyObservation

getFlappyObservation(
  state: FlappyGameState,
  difficultyScale: number,
): number[]

Generate the network observation vector for the current state.

Educational note: This helper is the environment-facing bridge into the shared observation system. It keeps the environment API simple while ensuring evaluation, training, and browser playback all derive their policy inputs from the same feature definitions.

Observation (6 numbers):

bird y position normalized to [0, 1]
bird vertical velocity normalized to [-1, 1]
distance to next pipe normalized to [0, 1]
delta (bird y - gap center y) normalized to [-1, 1]
next pipe gap top normalized to [0, 1]
next pipe gap bottom normalized to [0, 1]

Parameters:

state - Current state.
difficultyScale - Curriculum difficulty scale in [0, 1].

Returns: Input vector for the neural network.

getFlappyObservationFeatures

getFlappyObservationFeatures(
  state: FlappyGameState,
  difficultyScale: number,
): SharedObservationFeatures

Resolve structured observation features for policy input and reward shaping.

Parameters:

state - Current state.
difficultyScale - Curriculum difficulty scale in [0, 1].

Returns: Named feature object.

Example:

const features = getFlappyObservationFeatures(state, 1);