mazeMovement/policy

Action policy helpers for the dedicated mazeMovement module.

This file is the decision-making shelf inside the mazeMovement module. Once runtime helpers have refreshed the agent's current world view, the policy boundary decides how logits, exploration pressure, and short-horizon heuristics combine into one movement choice.

It owns direction selection, epsilon handling, short-horizon policy overrides, and saturation-driven bias control because those concerns answer a shared question: how should the next action be chosen before reward shaping reacts to the result?

Read this after the runtime helpers if you want the clean handoff from perception into action choice. Then continue into shaping to see how the trainer judges the consequences of that choice.

mazeMovement/policy/mazeMovement.policy.ts

applyMazeMovementEpsilonExploration

applyMazeMovementEpsilonExploration(
  state: SimulationState,
  encodedMaze: number[][],
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply epsilon-greedy exploration to the current action choice.

Parameters:

applyMazeMovementForcedExploration

applyMazeMovementForcedExploration(
  state: SimulationState,
  encodedMaze: number[][],
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Force a random valid move when the policy has stalled with repeated no-move outputs.

Parameters:

applyMazeMovementProximityGreedy

applyMazeMovementProximityGreedy(
  state: SimulationState,
  encodedMaze: number[][],
  distanceMap: number[][] | undefined,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply the short-horizon proximity-greedy override near the maze exit.

Parameters:

applyMazeMovementSaturationAndBiasAdjust

applyMazeMovementSaturationAndBiasAdjust(
  state: SimulationState,
  outputs: number[],
  network: INetwork,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Detect saturation and optionally damp output-node biases.

Parameters:

computeMazeMovementEpsilon

computeMazeMovementEpsilon(
  stepNumber: number,
  stepsSinceImprovement: number,
  distHere: number,
  saturations: number,
): number

Compute the adaptive epsilon used for policy exploration.

Parameters:

Returns: Exploration epsilon in the range [0, 1].

decideMazeMovementDirection

decideMazeMovementDirection(
  state: SimulationState,
  network: INetwork,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Activate the network, record output history, and choose the next direction.

Parameters:

selectMazeMovementDirection

selectMazeMovementDirection(
  outputs: number[],
): DirectionSelectionStats

Convert raw network outputs into a chosen direction plus diagnostics.

Parameters:

Returns: Chosen direction plus softmax and entropy diagnostics.

Generated from source JSDoc • GitHub