mazeMovement/shaping

Reward-shaping helpers for the dedicated mazeMovement module.

This file is the judgment layer of the mazeMovement pipeline. After the policy boundary chooses a move, the shaping boundary decides how that move should affect learning pressure: which behaviors deserve reward, which should be penalized, and which patterns signal stagnation or useful exploration.

It owns movement reward shaping, stagnation penalties, entropy-guided bonuses, and other post-action fitness adjustments so the main facade does not have to mix motion updates with reward math in one long control-flow block.

Read this file as the bridge between "the agent moved" and "the run's score changed." Runtime supplies the world state, policy chooses the action, and this boundary turns the aftermath into training signal.

mazeMovement/shaping/mazeMovement.shaping.ts

applyMazeMovementEntropyGuidanceShaping

applyMazeMovementEntropyGuidanceShaping(
  state: SimulationState,
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply entropy-guided shaping based on confidence and perceptual guidance.

Parameters:

state - Mutable simulation state for the active run.
rewardScale - Global reward scale used by the penalties and bonuses.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementExplorationVisitAdjustment

applyMazeMovementExplorationVisitAdjustment(
  state: SimulationState,
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply the per-cell exploration bonus or revisit penalty.

Parameters:

state - Mutable simulation state for the active run.
rewardScale - Global reward scale used by the adjustment.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementGlobalDistanceImprovementBonus

applyMazeMovementGlobalDistanceImprovementBonus(
  state: SimulationState,
  encodedMaze: number[][],
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply the long-horizon global-distance improvement bonus.

Parameters:

state - Mutable simulation state for the active run.
encodedMaze - Maze grid used for distance lookup.
rewardScale - Global reward scale used by the bonus magnitude.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementLocalAreaPenalty

applyMazeMovementLocalAreaPenalty(
  state: SimulationState,
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply a local-area stagnation penalty when the run oscillates in a tight window.

Parameters:

state - Mutable simulation state for the active run.
rewardScale - Global reward scale used for the penalty magnitude.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementPostActionPenalties

applyMazeMovementPostActionPenalties(
  state: SimulationState,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply the post-action shaping and penalty aggregation phase.

Parameters:

state - Mutable simulation state for the active run.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementProgressShaping

applyMazeMovementProgressShaping(
  state: SimulationState,
  distanceDelta: number,
  improved: boolean,
  worsened: boolean,
  rewardScale: number,
): void

Apply progress and away-from-goal shaping after a move.

Parameters:

state - Mutable simulation state for the active run.
distanceDelta - Positive when the agent moved closer to the goal.
improved - True when the move improved distance to the goal.
worsened - True when the move increased distance to the goal.
rewardScale - Global reward scale used by the shaping terms.

applyMazeMovementRepetitionAndBacktrackPenalties

applyMazeMovementRepetitionAndBacktrackPenalties(
  state: SimulationState,
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply repetition and immediate-backtrack penalties.

Parameters:

state - Mutable simulation state for the active run.
rewardScale - Global reward scale used by the penalties.
coordinateScratch - Reused coordinate scratch buffer.

applyMazeMovementSaturationPenaltyCycle

applyMazeMovementSaturationPenaltyCycle(
  state: SimulationState,
  rewardScale: number,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Apply the periodic saturation penalty cycle.

Parameters:

state - Mutable simulation state for the active run.
rewardScale - Global reward scale used by the penalties.
coordinateScratch - Reused coordinate scratch buffer.

executeMazeMovementAndRewards

executeMazeMovementAndRewards(
  state: SimulationState,
  encodedMaze: number[][],
  distanceMap: number[][] | undefined,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): void

Execute the chosen move and apply the shaping terms tied to that move.

Parameters:

state - Mutable simulation state for the active run.
encodedMaze - Maze grid used for move validity and distance lookup.
distanceMap - Optional precomputed distance map.
coordinateScratch - Reused coordinate scratch buffer.

maybeTerminateMazeMovementDeepStagnation

maybeTerminateMazeMovementDeepStagnation(
  state: SimulationState,
  coordinateScratch: Int32Array<ArrayBufferLike>,
): boolean

Apply the deep-stagnation termination penalty when appropriate.

Parameters:

state - Mutable simulation state for the active run.
coordinateScratch - Reused coordinate scratch buffer.

Returns: True when the run should terminate early.