mazeMovement/policy
Action policy helpers for the dedicated mazeMovement module.
This file is the decision-making shelf inside the mazeMovement module. Once runtime helpers have refreshed the agent's current world view, the policy boundary decides how logits, exploration pressure, and short-horizon heuristics combine into one movement choice.
It owns direction selection, epsilon handling, short-horizon policy overrides, and saturation-driven bias control because those concerns answer a shared question: how should the next action be chosen before reward shaping reacts to the result?
Read this after the runtime helpers if you want the clean handoff from perception into action choice. Then continue into shaping to see how the trainer judges the consequences of that choice.
mazeMovement/policy/mazeMovement.policy.ts
applyMazeMovementEpsilonExploration
applyMazeMovementEpsilonExploration(
state: SimulationState,
encodedMaze: number[][],
coordinateScratch: Int32Array<ArrayBufferLike>,
): void
Apply epsilon-greedy exploration to the current action choice.
Parameters:
state- - Mutable simulation state for the active run.encodedMaze- - Maze grid used for move validity checks.coordinateScratch- - Reused coordinate scratch buffer.
applyMazeMovementForcedExploration
applyMazeMovementForcedExploration(
state: SimulationState,
encodedMaze: number[][],
coordinateScratch: Int32Array<ArrayBufferLike>,
): void
Force a random valid move when the policy has stalled with repeated no-move outputs.
Parameters:
state- - Mutable simulation state for the active run.encodedMaze- - Maze grid used for move validity checks.coordinateScratch- - Reused coordinate scratch buffer.
applyMazeMovementProximityGreedy
applyMazeMovementProximityGreedy(
state: SimulationState,
encodedMaze: number[][],
distanceMap: number[][] | undefined,
coordinateScratch: Int32Array<ArrayBufferLike>,
): void
Apply the short-horizon proximity-greedy override near the maze exit.
Parameters:
state- - Mutable simulation state for the active run.encodedMaze- - Maze grid used for move validity checks.distanceMap- - Optional precomputed distance map.coordinateScratch- - Reused coordinate scratch buffer.
applyMazeMovementSaturationAndBiasAdjust
applyMazeMovementSaturationAndBiasAdjust(
state: SimulationState,
outputs: number[],
network: INetwork,
coordinateScratch: Int32Array<ArrayBufferLike>,
): void
Detect saturation and optionally damp output-node biases.
Parameters:
state- - Mutable simulation state for the active run.outputs- - Raw network logits for the current step.network- - Policy network that produced the logits.coordinateScratch- - Reused scratch buffer for temporary penalties.
computeMazeMovementEpsilon
computeMazeMovementEpsilon(
stepNumber: number,
stepsSinceImprovement: number,
distHere: number,
saturations: number,
): number
Compute the adaptive epsilon used for policy exploration.
Parameters:
stepNumber- - Global step number inside the active simulation.stepsSinceImprovement- - Number of steps without improvement.distHere- - Current distance to goal for the active position.saturations- - Rolling saturation count from the shared run state.
Returns: Exploration epsilon in the range [0, 1].
decideMazeMovementDirection
decideMazeMovementDirection(
state: SimulationState,
network: INetwork,
coordinateScratch: Int32Array<ArrayBufferLike>,
): void
Activate the network, record output history, and choose the next direction.
Parameters:
state- - Mutable simulation state for the active run.network- - Policy network used for the current step.
selectMazeMovementDirection
selectMazeMovementDirection(
outputs: number[],
): DirectionSelectionStats
Convert raw network outputs into a chosen direction plus diagnostics.
Parameters:
outputs- - Raw action logits for the four maze directions.
Returns: Chosen direction plus softmax and entropy diagnostics.