mazeMovement

Episode-simulation boundary for one maze-controlled agent.

This folder is where a policy stops being abstract logits and starts paying for local decisions. One run moves through perception, direction selection, collision-aware movement, shaping, and finalization until the agent reaches the exit or exhausts its step budget.

The boundary exists because single-episode logic needs both honesty and inspectability. If movement rules, reward shaping, and stop conditions were scattered across fitness or evolution helpers, it would be much harder to tell whether poor results came from weak policy, thin observations, harsh shaping, or simple runtime edge cases.

Read the folder as four cooperating shelves. runtime/ builds perception, visit bookkeeping, and low-level state transitions. policy/ converts raw outputs into concrete directional choices and exploration nudges. shaping/ applies the score semantics that make sparse-goal navigation learnable. finalization/ folds the finished path into one result the engine and fitness layers can compare.

The public class stays class-based on purpose. Existing imports remain stable, but the real teaching value is now inside the split helpers. The facade tells the reader what the episode boundary promises; the subfolders explain how that promise is kept.

Read this chapter in three passes. Start with simulateAgent(...) when you want the whole episode loop. Continue to selectDirection(...) and moveAgent(...) when you want the policy-to-action seam. Finish in the runtime, policy, shaping, and finalization folders when you need the exact bookkeeping or reward logic behind one run.

flowchart LR
  classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px;
  classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px;

  Vision["vision and visit state"]:::base --> Policy["direction selection\nand exploration nudges"]:::accent
  Policy --> Move["collision-aware movement"]:::base
  Move --> Shaping["progress and penalty shaping"]:::base
  Shaping --> Finalize["episode result\nfitness path progress"]:::base

flowchart TD
  classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px;
  classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px;

  MazeMovement["mazeMovement/"]:::accent --> Facade["mazeMovement.ts\npublic episode facade"]:::base
  MazeMovement --> Runtime["runtime/\nvision and state"]:::base
  MazeMovement --> Policy["policy/\naction choice"]:::base
  MazeMovement --> Shaping["shaping/\nreward and penalties"]:::base
  MazeMovement --> Finalization["finalization/\nresult folding"]:::base

For background on why direction selection talks about probabilities and entropy instead of only the raw logits, see Wikipedia contributors, Softmax function, which is the probability transform used by the action-diagnostics helper layer.

Example: inspect the direction choice implied by one network output vector.

const directionStats = MazeMovement.selectDirection([0.4, 1.2, -0.3, 0.1]);

console.log(directionStats.selectedDirection);
console.log(directionStats.entropy);

Example: simulate one complete maze episode for a candidate network.

const result = MazeMovement.simulateAgent(
  network,
  encodedMaze,
  startPos,
  exitPos,
  distanceMap,
  160,
);

console.log(result.fitness, result.reachedExit);

mazeMovement/mazeMovement.types.ts

Shared type surface for the dedicated mazeMovement module.

Step 2 moves internal simulation contracts here first so later helper files can depend on one narrow typed surface.

DirectionSelectionStats

Diagnostic telemetry produced when selecting a direction from network logits.

Encapsulates the chosen direction along with entropy and probability data so downstream helpers can apply shaping rewards and penalties without rederiving softmax statistics on hot paths.

MazeMovementBufferPools

Initialized pooled buffers shared across maze movement simulations.

These pools are reused between runs to keep the hot path allocation-light while preserving a narrow typed seam for service helpers.

MazeMovementRunServiceState

Mutable run-scoped service state shared by the mazeMovement facade.

The dedicated services module owns these counters so later runtime, policy, and shaping helpers can depend on one explicit mutable surface instead of directly reaching into class-private state.

MazeMovementSimulationResult

Result shape returned by MazeMovement.simulateAgent.

This contract matches the legacy inline return annotation so callers can keep depending on the current fields while the dedicated module boundary is being extracted.

SimulationState

Internal aggregate state used during a single agent simulation run.

Purpose:

Hold all derived runtime values, counters and diagnostic stats used by the MazeMovement simulation helpers. This shape is intentionally rich so tests and visualisers can inspect intermediate state when debugging.

Notes:

This interface remains internal to the mazeMovement module boundary.
Property descriptions are explicit to surface helpful tooltips in editors.

mazeMovement/mazeMovement.ts

MazeMovement

Maze movement entry surface used by fitness evaluation and evolution runs.

The public API intentionally remains class-based so existing example imports do not change while the implementation lives under the dedicated module.

#COORDINATE_SCRATCH

Reused integer coordinate scratch for hot-path movement helpers.

#hasReachedExit

#hasReachedExit(
  simulationState: SimulationState,
  exitPos: readonly [number, number],
): boolean

Determine whether the current state has reached the maze exit.

Parameters:

simulationState - Mutable run state for the active episode.
exitPos - Exit coordinate for the run.

Returns: True when the agent position matches the exit coordinate.

#processMovementAndShaping

#processMovementAndShaping(
  simulationState: SimulationState,
  encodedMaze: number[][],
  distanceMap: number[][] | undefined,
): boolean

Execute the selected move, apply post-action shaping, and evaluate stop rules.

Parameters:

simulationState - Mutable run state for the active episode.
encodedMaze - Maze grid used for movement and distance lookup.
distanceMap - Optional precomputed distance map.

Returns: True when the episode should stop after this step.

#processPerceptionAndPolicy

#processPerceptionAndPolicy(
  simulationState: SimulationState,
  network: INetwork,
  encodedMaze: number[][],
  exitPos: readonly [number, number],
  distanceMap: number[][] | undefined,
): void

Refresh visit bookkeeping, perception state, and direction policy.

Parameters:

simulationState - Mutable run state for the active episode.
network - Policy network used for action selection.
encodedMaze - Maze grid used for the run.
exitPos - Exit coordinate for the run.
distanceMap - Optional precomputed distance map.

moveAgent

moveAgent(
  encodedMaze: readonly (readonly number[])[],
  position: readonly [number, number],
  direction: number,
): [number, number]

Move the agent one step in the requested direction when the target cell is open.

Parameters:

encodedMaze - Maze grid used for collision checks.
position - Current agent position.
direction - Direction index in the action space.

Returns: New position when the move is valid, otherwise the original position.

Example:

const moved = MazeMovement.moveAgent(encodedMaze, [3, 2], 1);

selectDirection

selectDirection(
  outputs: number[],
): DirectionSelectionStats

Convert raw network outputs into a chosen action plus diagnostics.

Parameters:

outputs - Raw action logits for the four maze directions.

Returns: Chosen direction plus softmax and entropy diagnostics.

Example:

const stats = MazeMovement.selectDirection([0.2, 1.4, -0.1, 0]);

simulateAgent

simulateAgent(
  network: INetwork,
  encodedMaze: number[][],
  startPos: readonly [number, number],
  exitPos: readonly [number, number],
  distanceMap: number[][] | undefined,
  maxSteps: number,
): MazeMovementSimulationResult

Simulate one full maze episode for a network-controlled agent.

Parameters:

network - Policy network used to choose actions.
encodedMaze - Maze grid for the active episode.
startPos - Start coordinate.
exitPos - Exit coordinate.
distanceMap - Optional precomputed distance map.
maxSteps - Maximum allowed step count before termination.

Returns: Final simulation result including path, fitness, and progress.

mazeMovement/mazeMovement.services.ts

Shared mutable services for the dedicated mazeMovement module.

This module owns the pooled buffers, PRNG state, output-history plumbing, and shared run-scoped counters used by the legacy MazeMovement facade while Step 2 incrementally moves helper categories into the dedicated boundary.

getMazeMovementBufferMetadata

getMazeMovementBufferMetadata(): { cachedWidth: number; cachedHeight: number; }

Read the currently cached maze dimensions for bounds and index helpers.

Returns: Cached width and height for the active pooled buffers.

getMazeMovementRunServiceState

getMazeMovementRunServiceState(): MazeMovementRunServiceState

Expose the shared mutable run-scoped state used across helper categories.

Returns: The singleton mutable run-state object for the current process.

indexMazeMovementCell

indexMazeMovementCell(
  x: number,
  y: number,
): number

Convert a cell coordinate into the pooled linear grid index.

Parameters:

x - Zero-based maze column.
y - Zero-based maze row.

Returns: Linearized index used by pooled grid buffers.

initializeMazeMovementBufferPools

initializeMazeMovementBufferPools(
  width: number,
  height: number,
  maxSteps: number,
): MazeMovementBufferPools

Ensure the pooled grid and path buffers are initialized for a run.

Parameters:

width - Maze width in cells.
height - Maze height in cells.
maxSteps - Maximum path length expected for the run.

Returns: The initialized pooled buffer surface.

materializeMazeMovementPath

materializeMazeMovementPath(
  length: number,
): [number, number][]

Materialize the active pooled path buffers into a fresh tuple array.

Parameters:

length - Number of active path entries to copy.

Returns: A newly allocated materialized path snapshot.

randomMazeMovementUnit

randomMazeMovementUnit(): number

Generate a pseudo-random number in the range [0, 1).

Returns: A deterministic or host-random unit float for exploration logic.

readMazeMovementOutputHistory

readMazeMovementOutputHistory(
  network: INetwork,
): number[][] | undefined

Read the reflected _lastStepOutputs network history when present.

Parameters:

network - Network that may carry the reflected output history.

Returns: Sanitized output history or undefined when absent or invalid.

requireMazeMovementBufferPools

requireMazeMovementBufferPools(): MazeMovementBufferPools

Return the initialized pooled buffer surface for the current run.

Returns: The shared buffer pools.

resetMazeMovementRunServiceState

resetMazeMovementRunServiceState(): MazeMovementRunServiceState

Reset the shared mutable run-scoped state before a new simulation begins.

Returns: The reused singleton state after reset.

writeMazeMovementOutputHistory

writeMazeMovementOutputHistory(
  network: INetwork,
  history: number[][],
): void

Persist the reflected _lastStepOutputs network history.

Parameters:

network - Network receiving the reflected output history.
history - Bounded output-history payload to persist.

mazeMovement/mazeMovement.constants.ts

Frozen tuning surface for the dedicated mazeMovement module.

This constant table keeps simulation policy, shaping thresholds, and lookup tables in one place so the public facade can stay focused on orchestration.

MAZE_MOVEMENT_CONSTANTS

Frozen tuning surface for the dedicated mazeMovement module.

This constant table keeps simulation policy, shaping thresholds, and lookup tables in one place so the public facade can stay focused on orchestration.

mazeMovement/mazeMovement.utils.ts

computeActionEntropyFromCounts

computeActionEntropyFromCounts(
  directionCounts: number[],
  logActions: number,
  scratch: Float64Array<ArrayBufferLike>,
): number

Compute normalized action entropy from direction counts.

Parameters:

directionCounts - Number of moves taken in each direction.
logActions - Precomputed normalization factor for the action space.
scratch - Single-value floating-point scratch buffer reused by the caller.

Returns: Normalized entropy in the range [0, 1].

isFiniteNumberArray

isFiniteNumberArray(
  candidate: unknown,
): boolean

Determine whether the provided value is a finite-number array.

Parameters:

candidate - Value to inspect.

Returns: True when the input is an array of finite numbers.

materializePath

materializePath(
  length: number,
  pathX: Int32Array<ArrayBufferLike>,
  pathY: Int32Array<ArrayBufferLike>,
): [number, number][]

Materialize the active prefix of pooled path buffers into a fresh array.

Parameters:

length - Number of path entries to materialize.
pathX - Pooled X-coordinate buffer.
pathY - Pooled Y-coordinate buffer.

Returns: A newly allocated array of path tuples.

nextPowerOfTwo

nextPowerOfTwo(
  n: number,
): number

Return the smallest power-of-two integer greater than or equal to n.

Parameters:

n - Target minimum integer capacity.

Returns: The smallest power of two greater than or equal to n.

readOutputHistory

readOutputHistory(
  network: INetwork,
): number[][] | undefined

Read the optional _lastStepOutputs history stored on a network.

Parameters:

network - Network instance that may expose a reflected outputs history.

Returns: Sanitized history buffer or undefined when absent or invalid.

sumVisionGroup

sumVisionGroup(
  vision: number[],
  start: number,
  groupLength: number,
  scratch: Float64Array<ArrayBufferLike>,
): number

Sum a contiguous group of entries from a vision vector into a reusable scratch buffer.

Parameters:

vision - Flat perception vector.
start - Start index of the group to sum.
groupLength - Number of entries in the group.
scratch - Reusable scratch buffer populated with copied values.

Returns: Numeric sum of the selected group.

writeOutputHistory

writeOutputHistory(
  network: INetwork,
  history: number[][],
): void

Persist a bounded outputs history on the network via reflection.

Parameters:

network - Target network to mutate.
history - Updated history buffer.