methods
Shared method families for signal shaping, search pressure, and structural policy.
This folder is the library's reusable vocabulary shelf. The heavier
controller chapters in neat/ decide when a policy should be applied. The
methods/ folder defines what those policy choices actually are. That split
keeps the rest of the repo readable: examples, architecture helpers, and
evolutionary controllers can reuse the same small method objects without each
subsystem inventing its own private dialect.
The quickest way to understand the chapter is to split it into three reader
questions. How should a unit transform signal? That is Activation. How
should error or learning tempo be interpreted? That is Cost and Rate.
How should evolutionary pressure and wiring structure be adjusted? That is
selection, mutation, crossover, gating, and groupConnection.
That boundary matters because it lets you change policy without changing
orchestration. A Neat controller can switch from gentle to aggressive
selection, or an architecture builder can swap activation families, without
rewriting evaluation loops or graph code. The method objects stay small on
purpose so experiments can compose them instead of hiding them in conditionals.
The structural pair deserves special attention because the names sound close
while the responsibilities are different. groupConnection answers how two
groups should be wired before any runtime signal exists. gating answers how
an already-existing connection should be modulated once the network is
running. One is about topology layout. The other is about runtime control.
Two compact background bridges help here. See Wikipedia contributors, Activation function, for the signal-shaping side of the shelf, and Wikipedia contributors, Selection (genetic algorithm), for the search-pressure side. Together they frame the two big forces this folder keeps in play: how nodes respond to signal and how search decides which traits survive.
Read the chapter in three passes:
- start with
Activation,Cost, andRatewhen you are thinking like a trainer tuning signal flow, error shape, and optimization tempo, - continue to
selection,mutation, andcrossoverwhen you are thinking like an evolutionary controller tuning search pressure, - finish with
gatingandgroupConnectionwhen you need lower-level structural vocabulary and want to distinguish routing control from raw wiring layout.
flowchart TD classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px; classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px; Methods[methods chapter]:::accent --> Training[Training and optimization vocabulary]:::base Methods --> Evolution[Evolutionary search vocabulary]:::base Methods --> Structure[Structural control vocabulary]:::base Training --> Activation[Activation]:::base Training --> Cost[Cost]:::base Training --> Rate[Rate]:::base Evolution --> Selection[selection]:::base Evolution --> Mutation[mutation]:::base Evolution --> Crossover[crossover]:::base Structure --> Gating[gating]:::base Structure --> Connection[groupConnection]:::base
flowchart LR classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px; classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px; Signals["How should signal behave?"]:::accent --> SignalShelf["Activation + Cost + Rate"]:::base Search["How should search behave?"]:::accent --> SearchShelf["selection + mutation + crossover"]:::base Structure["How should wiring be controlled?"]:::accent --> StructureShelf["gating + groupConnection"]:::base
Example: assemble one compact training vocabulary for signal shape, loss, and tempo.
const trainingPolicy = {
activation: Activation.relu,
loss: Cost.mse,
schedule: Rate.step(0.9, 100),
};
Example: assemble a stronger search-pressure and routing vocabulary without changing the surrounding controller code.
const evolutionaryPolicy = {
parentSelection: { ...selection.POWER, power: 6 },
gatePlacement: gating.SELF,
denseBridge: groupConnection.ALL_TO_ALL,
};
methods/methods.ts
Activation
Runtime registry of built-in and custom activation functions.
Read this surface as a behavior shelf for neurons rather than as a loose bag of math helpers. The chosen activation determines what each node can express: whether it saturates, stays sparse, preserves negative values, or responds smoothly enough for gradient-based updates.
The built-in functions cluster into a few useful families:
- saturating classics such as
logistic,sigmoid, andtanhkeep outputs bounded and are easy to reason about, - piecewise linear choices such as
relu,hardTanh, andsteptrade smoothness for cheap evaluation and strong gating behavior, - localized or shape-heavy transforms such as
gaussian,sinusoid, andbentIdentityare useful when you want periodic, radial, or gentler near-linear responses, - modern smooth hidden-layer options such as
softplus,swish,gelu, andmishaim to keep optimization stable without collapsing everything into hard zero-or-one decisions.
flowchart TD Shelf[Activation shelf] --> Bounded[Bounded classics] Shelf --> Piecewise[Piecewise gates] Shelf --> Specialized[Shape-specialized] Shelf --> Smooth[Smooth modern] Bounded --> BoundedExamples[logistic sigmoid tanh] Piecewise --> PiecewiseExamples[relu hardTanh step] Specialized --> SpecializedExamples[gaussian sinusoid bentIdentity] Smooth --> SmoothExamples[softplus swish gelu mish]
Every activation shares the same calling convention: pass the input value as
the first argument and optionally pass true as the second argument when you
want the local derivative instead of the forward value. That derivative mode
keeps the registry compatible with the classic Neataptic API shape while also
making the individual implementations easy to test in isolation.
Minimal workflow:
const hiddenValue = Activation.relu(weightedSum);
const outputSlope = Activation.logistic(weightedSum, true);
registerCustomActivation(
'cube',
(inputValue, shouldComputeDerivative = false) =>
shouldComputeDerivative ? 3 * inputValue * inputValue : inputValue ** 3,
);
const customValue = Activation.cube(0.5);
A practical chooser for first experiments:
- start with
reluwhen you want a simple, sparse hidden-layer default, - prefer
tanhwhen zero-centered bounded output helps reasoning or compatibility with older recurrent setups, - reach for
softplus,swish,gelu, ormishwhen you want a smoother alternative to ReLU, - keep
logisticorsigmoidfor bounded probability-like outputs, - use
registerCustomActivation()when the built-ins are close but not quite the transfer curve your experiment needs.
crossover
Crossover methods for genetic algorithms.
These methods implement the crossover strategies described in the Instinct algorithm, enabling the creation of offspring with unique combinations of parent traits.
Read this file as an inheritance-policy shelf: each method answers a different question about how aggressively two parents should be mixed.
SINGLE_POINTpreserves one contiguous prefix from one parent and the remaining suffix from the other,TWO_POINTpreserves a middle segment boundary instead of only one split,UNIFORMtreats each gene as an independent coin flip,AVERAGEblends compatible numeric genes instead of copying segments.
A practical chooser for first experiments:
- start with
UNIFORMwhen you want broad mixing and do not need contiguous blocks of structure to stay together, - use
SINGLE_POINTorTWO_POINTwhen adjacency matters and you want to preserve larger parent segments, - choose
AVERAGEwhen the genome is meaningfully numeric and interpolation is more useful than hard parent switching.
Minimal workflow:
const broadMixing = crossover.UNIFORM;
const oneCut = crossover.SINGLE_POINT;
const twoCut = {
...crossover.TWO_POINT,
config: [0.25, 0.75],
};
const blendedOffspring = crossover.AVERAGE;
flowchart LR Parents[Two parent genomes] --> Segment[Segment-preserving crossover] Parents --> GeneWise[Gene-wise crossover] Parents --> Blend[Numeric blending] Segment --> Single[SINGLE_POINT] Segment --> Double[TWO_POINT] GeneWise --> Uniform[UNIFORM] Blend --> Average[AVERAGE]
gating
Defines the small routing shelf that decides where a gater applies control.
Gating is one of the lightest structural policies in the library: the graph stays the same, but another neuron or group gets to modulate how strongly a connection participates in the current computation. That makes gating useful when a network needs context-sensitive routing, soft memory behavior, or a way to expose only part of an otherwise valid intermediate result.
Read this file as an answer to one placement question: which part of the connection should the gater influence?
INPUTmodulates the signal as it enters the target,OUTPUTmodulates what the target passes onward,SELFmodulates the connection strength itself.
Those choices matter because they create different control surfaces. Some experiments need a gate that behaves like an evidence filter, some need a gate that behaves like an output valve, and some need the weight itself to become state-dependent instead of fixed.
A practical chooser for first experiments:
- start with
INPUTwhen the main question is how much incoming evidence should reach the target at all, - use
OUTPUTwhen the target should still integrate normally but reveal only part of its result to the next layer, - choose
SELFwhen the connection should act more like a dynamic coupling whose strength changes with context.
flowchart LR Source[Source neuron] --> Connection[Connection weight] Connection --> Target[Target neuron] Gater[Gater] Gater -. INPUT .-> Target Gater -. OUTPUT .-> Target Gater -. SELF .-> Connection
Minimal workflow:
const routingShelf = {
incomingGate: gating.INPUT,
outgoingGate: gating.OUTPUT,
adaptiveWeightGate: gating.SELF,
};
groupConnection
Defines the small wiring-policy shelf for connecting one node group to another.
Read this file as a topology chooser rather than a bag of connection names. These policies do not decide weights, learning, or mutation pressure; they answer a narrower structural question first: what edge pattern should exist between the source group and the target group before later optimization details matter?
The three built-ins answer three different wiring intents:
ALL_TO_ALLasks for the densest possible bridge between the groups,ALL_TO_ELSEkeeps that dense bridge but avoids trivial self-links when the source and target are the same group,ONE_TO_ONEpreserves positional pairing instead of creating a dense mesh.
Those choices matter because they create very different starting biases. A dense bridge maximizes routing freedom, a dense-without-self-links bridge is often the cleanest way to describe intra-group recurrence, and one-to-one wiring preserves explicit alignment instead of encouraging cross-talk.
A practical chooser for first experiments:
- start with
ALL_TO_ALLwhen every source feature should be allowed to influence every target unit, - use
ALL_TO_ELSEwhen you want dense recurrent-style reuse inside one group without creating direct self-connections, - choose
ONE_TO_ONEwhen index alignment matters and each source unit should feed exactly one partner.
flowchart LR Dense[Dense mesh] --> AllToAll[ALL_TO_ALL] Dense --> AllToElse[ALL_TO_ELSE] Paired[Positional pairing] --> OneToOne[ONE_TO_ONE]
Minimal workflow:
const wiringShelf = {
denseBridge: groupConnection.ALL_TO_ALL,
denseWithoutSelfLoops: groupConnection.ALL_TO_ELSE,
alignedBridge: groupConnection.ONE_TO_ONE,
};
mutation
Defines various mutation methods used in neuroevolution algorithms.
Mutation introduces genetic diversity into the population by randomly altering parts of an individual's genome (the neural network structure or parameters). This is crucial for exploring the search space and escaping local optima.
Common mutation strategies include adding or removing nodes and connections, modifying connection weights and node biases, and changing node activation functions. These operations allow the network topology and parameters to adapt over generations.
The methods listed here are inspired by techniques used in algorithms like NEAT and particularly the Instinct algorithm, providing a comprehensive set of tools for evolving network architectures.
Read this file as a mutation toolbox organized by what kind of change you want evolution to make:
- topology-growth operators such as
ADD_NODE,ADD_CONN,ADD_SELF_CONN, andADD_BACK_CONNmake the graph more expressive, - topology-pruning operators such as
SUB_NODE,SUB_CONN,SUB_SELF_CONN, andSUB_BACK_CONNremove structure and can simplify an overgrown search, - parameter-tuning operators such as
MOD_WEIGHT,MOD_BIAS, andREINIT_WEIGHTchange numeric behavior without rewriting the graph, - behavior-shaping operators such as
MOD_ACTIVATION,ADD_GATE,SUB_GATE, andSWAP_NODESchange how existing structure computes, - architecture-expansion operators such as
ADD_LSTM_NODEandADD_GRU_NODEintroduce memory-oriented building blocks.
A practical reading order is:
- start with
MOD_WEIGHTandMOD_BIASto understand the gentlest search moves, - then compare
ADD_CONNandADD_NODEto see how structure starts to grow, - then read the recurrent and gating operators when you want temporal behavior or context-sensitive routing,
- finish with
ALLandFFW, which summarize which operators belong in a broad search versus a strictly feedforward one.
A practical chooser for first experiments:
- begin with weight and bias mutations when the topology is already plausible and you mainly want numeric refinement,
- allow
ADD_CONNandADD_NODEwhen the current architecture feels too rigid or too shallow, - enable gating or back-connections only when temporal memory or dynamic routing is actually part of the task,
- prefer
FFWas the safe shelf when a run must remain strictly feedforward.
flowchart TD Mutation[Mutation toolbox] --> Grow[Grow structure] Mutation --> Prune[Prune structure] Mutation --> Tune[Tune parameters] Mutation --> Shape[Reshape behavior] Mutation --> Memory[Add memory blocks] Grow --> GrowItems[ADD_NODE ADD_CONN ADD_SELF_CONN ADD_BACK_CONN] Prune --> PruneItems[SUB_NODE SUB_CONN SUB_SELF_CONN SUB_BACK_CONN] Tune --> TuneItems[MOD_WEIGHT MOD_BIAS REINIT_WEIGHT] Shape --> ShapeItems[MOD_ACTIVATION ADD_GATE SUB_GATE SWAP_NODES] Memory --> MemoryItems[ADD_LSTM_NODE ADD_GRU_NODE]
Minimal workflow:
const safeFeedforwardShelf = mutation.FFW;
const structuralSearchShelf = [
mutation.ADD_CONN,
mutation.ADD_NODE,
mutation.MOD_WEIGHT,
mutation.MOD_BIAS,
];
const recurrentSearchShelf = [
...structuralSearchShelf,
mutation.ADD_GATE,
mutation.ADD_BACK_CONN,
];
Supported mutation families:
ADD_NODE: Adds a new node by splitting an existing connection.SUB_NODE: Removes a hidden node and its connections.ADD_CONN: Adds a new connection between two unconnected nodes.SUB_CONN: Removes an existing connection.MOD_WEIGHT: Modifies the weight of an existing connection.MOD_BIAS: Modifies the bias of a node.MOD_ACTIVATION: Changes the activation function of a node.ADD_SELF_CONN: Adds a self-connection (recurrent loop) to a node.SUB_SELF_CONN: Removes a self-connection from a node.ADD_GATE: Adds a gating mechanism to a connection.SUB_GATE: Removes a gating mechanism from a connection.ADD_BACK_CONN: Adds a recurrent (backward) connection between nodes.SUB_BACK_CONN: Removes a recurrent (backward) connection.SWAP_NODES: Swaps the roles (bias and activation) of two nodes.REINIT_WEIGHT: Reinitializes all weights for a node.BATCH_NORM: Marks a node for batch normalization (stub).ADD_LSTM_NODE: Adds a new LSTM node (memory cell with gates).ADD_GRU_NODE: Adds a new GRU node (gated recurrent unit).
Summary shelves:
ALL: all mutation methods, including recurrent and memory-oriented ones.FFW: the feedforward-safe subset that avoids recurrence and gating.
selection
Defines various selection methods used in genetic algorithms to choose individuals for reproduction based on their fitness scores.
Selection is a crucial step that determines which genetic traits are passed on to the next generation. Different methods offer varying balances between exploration (maintaining diversity) and exploitation (favoring high-fitness individuals). The choice of selection method significantly impacts the algorithm's convergence speed and the diversity of the population. High selection pressure (strongly favoring the fittest) can lead to faster convergence but may result in premature stagnation at suboptimal solutions. Conversely, lower pressure maintains diversity but can slow down the search process.
Read this file as a compact pressure ladder:
FITNESS_PROPORTIONATEsays selection chance should scale with score,POWERsays front-runners should be favored more aggressively than raw proportional scores would imply,TOURNAMENTsays selection pressure should come from repeated local competitions instead of one global roulette view.
Those strategies are not only different implementations. They encode different ideas about what "deserves another child" means in an evolutionary run.
A practical chooser for first experiments:
- start with
FITNESS_PROPORTIONATEwhen you want the simplest global interpretation of score share, - move to
POWERwhen the best genomes are emerging but plain roulette pressure is still too gentle, - prefer
TOURNAMENTwhen score scale is noisy, unstable, or hard to compare across the whole population.
Minimal workflow:
const conservativeSelection = selection.FITNESS_PROPORTIONATE;
const aggressiveSelection = {
...selection.POWER,
power: 6,
};
const bracketSelection = {
...selection.TOURNAMENT,
size: 7,
probability: 0.75,
};
flowchart LR Population[Scored population] --> Proportionate[FITNESS_PROPORTIONATE<br/>probability tracks score share] Population --> Power[POWER<br/>front of ranking gets extra pressure] Population --> Tournament[TOURNAMENT<br/>small local bracket decides]
default
binary
binary(
targets: number[],
outputs: number[],
): number
Calculates the Binary Error rate, often used as a simple accuracy metric for classification.
This function calculates the proportion of misclassifications by comparing the
rounded network outputs (thresholded at 0.5) against the target labels.
It assumes target values are 0 or 1, and outputs are probabilities between 0 and 1.
Note: This is equivalent to 1 - accuracy for binary classification.
Returns: The proportion of misclassified samples (error rate, between 0 and 1).
cosineAnnealing
cosineAnnealing(
period: number,
minimumRate: number,
): (baseRate: number, iteration: number) => number
Implements a Cosine Annealing learning rate schedule.
This schedule varies the learning rate cyclically according to a cosine function.
It starts at the baseRate and smoothly anneals down to minimumRate over a
specified period of iterations, then potentially repeats. This can help
the model escape local minima and explore the loss landscape more effectively.
Often used with "warm restarts" where the cycle repeats. The mental model is
deliberate breathing: ramp down to settle, then restart high enough to
explore again.
Formula: learning_rate = minimumRate + 0.5 * (baseRate - minimumRate) * (1 + cos(pi * current_cycle_iteration / period))
Parameters:
period- The number of iterations over which the learning rate anneals frombaseRatetominimumRatein one cycle. Defaults to 1000.minimumRate- The minimum learning rate value at the end of a cycle. Defaults to 0.baseRate- The initial (maximum) learning rate for the cycle.iteration- The current training iteration.
Returns: A function that calculates the learning rate for a given iteration based on the cosine annealing schedule.
cosineAnnealingWarmRestarts
cosineAnnealingWarmRestarts(
initialPeriod: number,
minimumRate: number,
periodGrowthMultiplier: number,
): (baseRate: number, iteration: number) => number
Cosine Annealing with Warm Restarts (SGDR style) where the cycle length can grow by a multiplier after each restart.
This variant keeps the exploratory reset behavior of cosine annealing while allowing later cycles to last longer. That makes it useful when early exploration should be frequent but later training should settle for longer stretches between restarts.
Parameters:
initialPeriod- Length of the first cycle in iterations.minimumRate- Minimum learning rate at valley.periodGrowthMultiplier- Factor to multiply the period after each restart (>=1).
Returns: A function that replays cosine cycles whose length can grow after each restart.
crossEntropy
crossEntropy(
targets: number[],
outputs: number[],
): number
Calculates the Cross Entropy error, commonly used for classification tasks.
This function measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.
It uses a small epsilon (PROB_EPSILON = 1e-15) to prevent log(0) which would result in NaN.
Output values are clamped to the range [epsilon, 1 - epsilon] for numerical stability.
Returns: The mean cross-entropy error over all samples.
exp
exp(
decayFactor: number,
): (baseRate: number, iteration: number) => number
Implements an exponential decay learning rate schedule.
The learning rate decreases exponentially after each iteration, multiplying
by the decay factor decayFactor. This provides a smooth, continuous reduction
in the learning rate over time. Compared with step decay, the policy is less
about distinct phases and more about a steady fade in aggressiveness.
Formula: learning_rate = baseRate * decayFactor ^ iteration
Parameters:
decayFactor- The decay factor applied at each iteration. Should be less than 1. Defaults to 0.999.baseRate- The initial learning rate.iteration- The current training iteration.
Returns: A function that calculates the exponentially decayed learning rate for a given iteration.
fixed
fixed(): (baseRate: number, iteration: number) => number
Implements a fixed learning rate schedule.
The learning rate remains constant throughout the entire training process. This is the simplest schedule and serves as a baseline, but may not be optimal for complex problems. Use it when you want the rest of the system, not the schedule, to carry the full burden of training stability.
Parameters:
baseRate- The initial learning rate, which will remain constant.iteration- The current training iteration (unused in this method, but included for consistency).
Returns: A function that takes the base learning rate and the current iteration number, and always returns the base learning rate.
focalLoss
focalLoss(
targets: number[],
outputs: number[],
focalGamma: number,
focalAlpha: number,
): number
Calculates the Focal Loss, which is useful for addressing class imbalance in classification tasks. Focal loss down-weights easy examples and focuses training on hard negatives.
Returns: The mean focal loss.
hinge
hinge(
targets: number[],
outputs: number[],
): number
Calculates the Mean Hinge loss, primarily used for "maximum-margin" classification, most notably for Support Vector Machines (SVMs).
Hinge loss is used for training classifiers. It penalizes predictions that are not only incorrect but also those that are correct but not confident (i.e., close to the decision boundary). Assumes target values are encoded as -1 or 1.
Returns: The mean hinge loss.
inv
inv(
decayFactor: number,
decayPower: number,
): (baseRate: number, iteration: number) => number
Implements an inverse decay learning rate schedule.
The learning rate decreases as the inverse of the iteration number,
controlled by the decay factor decayFactor and exponent decayPower. The rate
decreases more slowly over time compared to exponential decay. Use it when
you want long training runs to keep some learning energy instead of cooling
too quickly.
Formula: learning_rate = baseRate / (1 + decayFactor * iteration ** decayPower)
Parameters:
decayFactor- Controls the rate of decay. Higher values lead to faster decay. Defaults to 0.001.decayPower- The exponent controlling the shape of the decay curve. Defaults to 2.baseRate- The initial learning rate.iteration- The current training iteration.
Returns: A function that calculates the inversely decayed learning rate for a given iteration.
labelSmoothing
labelSmoothing(
targets: number[],
outputs: number[],
smoothingFactor: number,
): number
Calculates the Cross Entropy with Label Smoothing. Label smoothing prevents the model from becoming overconfident by softening the targets.
Returns: The mean cross-entropy loss with label smoothing.
linearWarmupDecay
linearWarmupDecay(
totalStepCount: number,
warmupStepCount: number | undefined,
endRate: number,
): (baseRate: number, iteration: number) => number
Linear Warmup followed by Linear Decay to an end rate. Warmup linearly increases LR from near 0 up to baseRate over warmupStepCount, then linearly decays to endRate at totalStepCount. Iterations beyond totalStepCount clamp to endRate.
This schedule is common when the earliest steps are the most unstable: start gentle, reach full speed, then taper predictably.
Parameters:
totalStepCount- Total steps for full schedule (must be > 0).warmupStepCount- Steps for warmup (< totalStepCount). Defaults to 10% of totalStepCount.endRate- Final rate at totalStepCount.
Returns: A function that warms the learning rate up, then decays it toward a fixed floor.
mae
mae(
targets: number[],
outputs: number[],
): number
Calculates the Mean Absolute Error (MAE), another common loss function for regression tasks.
MAE measures the average of the absolute differences between predictions and actual values. Compared to MSE, it is less sensitive to outliers because errors are not squared.
Returns: The mean absolute error.
mape
mape(
targets: number[],
outputs: number[],
): number
Calculates the Mean Absolute Percentage Error (MAPE).
MAPE expresses the error as a percentage of the actual value. It can be useful for understanding the error relative to the magnitude of the target values. However, it has limitations: it's undefined when the target value is zero and can be skewed by target values close to zero.
Returns: The mean absolute percentage error, expressed as a proportion (e.g., 0.1 for 10%).
mse
mse(
targets: number[],
outputs: number[],
): number
Calculates the Mean Squared Error (MSE), a common loss function for regression tasks.
MSE measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. It is sensitive to outliers due to the squaring of the error terms.
Returns: The mean squared error.
msle
msle(
targets: number[],
outputs: number[],
): number
Calculates the Mean Squared Logarithmic Error (MSLE).
MSLE is often used in regression tasks where the target values span a large range
or when penalizing under-predictions more than over-predictions is desired.
It measures the squared difference between the logarithms of the predicted and actual values.
Uses log(1 + x) instead of log(x) for numerical stability and to handle inputs of 0.
Assumes both targets and outputs are non-negative.
Returns: The mean squared logarithmic error.
reduceOnPlateau
reduceOnPlateau(
options: { factor?: number | undefined; patience?: number | undefined; minDelta?: number | undefined; cooldown?: number | undefined; minRate?: number | undefined; verbose?: boolean | undefined; } | undefined,
): (baseRate: number, iteration: number, lastError?: number | undefined) => number
ReduceLROnPlateau style scheduler (stateful closure) that monitors error signal (third argument if provided) and reduces rate by 'factor' if no improvement beyond 'minDelta' for 'patience' iterations. Cooldown prevents immediate successive reductions. NOTE: Requires the training loop to call with signature (baseRate, iteration, lastError).
This is the chapter's reactive option. Instead of following a pre-planned calendar, the schedule listens for stalled improvement and responds only when the run appears to flatten out.
Parameters:
options- Optional reactive-control settings such as patience, cooldown, and minimum rate floor.
Returns: A stateful schedule function that may lower the learning rate when the monitored error stops improving.
softmaxCrossEntropy
softmaxCrossEntropy(
targets: number[],
outputs: number[],
): number
Softmax Cross Entropy for mutually exclusive multi-class outputs given raw (pre-softmax or arbitrary) scores. Applies a numerically stable softmax to the outputs internally then computes -sum(target * log(prob)). Targets may be soft labels and are expected to sum to 1 (will be re-normalized if not).
step
step(
decayFactor: number,
decayStepSize: number,
): (baseRate: number, iteration: number) => number
Implements a step decay learning rate schedule.
The learning rate is reduced by a multiplicative factor (decayFactor)
at predefined intervals (decayStepSize iterations). This allows for
faster initial learning, followed by finer adjustments as training progresses.
It is a good fit when you want training to move through a few deliberate
phases rather than one perfectly smooth curve.
Formula: learning_rate = baseRate * decayFactor ^ floor(iteration / decayStepSize)
Parameters:
decayFactor- The factor by which the learning rate is multiplied at each step. Should be less than 1. Defaults to 0.9.decayStepSize- The number of iterations after which the learning rate decays. Defaults to 100.baseRate- The initial learning rate.iteration- The current training iteration.
Returns: A function that calculates the decayed learning rate for a given iteration.