neat/lineage/core

Core contracts for lineage-analysis mechanics.

This file defines the narrow data shapes behind the controller-facing lineage helpers. The public lineage chapter talks about recent family overlap as a runtime signal; this core contract layer explains which pieces of state the mechanics actually need to produce that signal.

The important constraint is intentional minimalism. Ancestor traversal does not need the full genome implementation, and sampled uniqueness does not need the whole Neat controller. By describing only ids, parent links, population access, and RNG access, this layer keeps the lineage utilities reusable and easy to reason about.

Read the contracts in this order:

GenomeLike describes the smallest ancestry-bearing genome shape,
NeatLineageContext shows what the controller must supply for sampled lineage reads,
AncestorQueueEntry models one step in breadth-first ancestor expansion,
GenomeIndexPair models one sampled comparison between genomes.

flowchart TD
  classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px;
  classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px;

  genome[GenomeLike id plus parent ids]:::base --> queue[AncestorQueueEntry breadth-first work item]:::accent
  context[NeatLineageContext population plus RNG]:::base --> pairs[GenomeIndexPair sampled comparison]:::base
  queue --> ancestors[Ancestor sets]:::base
  pairs --> ancestors
  ancestors --> distance[Jaccard lineage distance]:::accent

neat/lineage/core/lineage.types.ts

AncestorQueueEntry

Queue entry used during breadth-first ancestor traversal.

Each entry records which ancestor id is being explored, how deep that ancestor sits relative to the original genome, and an optional cached genome reference so later traversal steps can enqueue that ancestor's parents.

GenomeIndexPair

Index pair representing one sampled genome comparison.

The lineage uniqueness metric does not compare every possible pair in large populations. Instead it samples a bounded set of pairs, then asks how much recent ancestry overlaps inside each comparison.

GenomeLike

Minimal genome shape used by lineage helpers.

Lineage analysis only needs two structural facts from each genome: a stable identifier and the identifiers of its recorded parents. Everything else is intentionally left open-ended so ancestry helpers can run against richer runtime objects without importing or depending on all of their fields.

In practice this interface is the bridge between reproduction-time lineage bookkeeping and read-side lineage metrics. If those ids are present and stable, the rest of the ancestry pipeline can stay decoupled from mutation, evaluation, telemetry, and speciation internals.

NeatLineageContext

Minimal NEAT context required by lineage helpers.

The lineage boundary only needs the current population and the RNG provider used for sampled ancestor uniqueness. That small host contract makes the ownership model explicit: lineage reporting is a read-side controller concern, not a stateful subsystem with its own storage or mutation rules.

The population supplies the ancestry graph to inspect. The RNG provider keeps sampled uniqueness deterministic so the same run can replay the same sampled comparisons during tests or exported-state debugging.

neat/lineage/core/lineage.core.ts

Lineage-analysis mechanics used by NEAT ancestry helpers.

This chapter is the mechanics layer beneath the controller-facing lineage helpers. Its job is to turn stored parent ids into a bounded ancestry signal that is cheap enough for telemetry and adaptive policy to read during a run.

The pipeline has four stages:

normalize parent links into one consistent ancestry input shape,
walk outward through recent ancestors with a bounded breadth-first queue,
sample a limited set of genome pairs instead of exhaustively comparing the full population,
measure ancestor-set overlap with Jaccard distance and fold the result into one mean uniqueness signal.

That staged split is what keeps the public lineage chapter readable. The root helpers can describe what lineage metrics mean, while this file owns the exact traversal, sampling, and distance math that makes those metrics stable.

Read the chapter in this order: start with normalizeParentIds() and createInitialQueue(), continue through collectAncestorIds() for the breadth-first walk, then sampleGenomePairs() for the comparison budget, and end with computePairDistances() plus computeAverageDistance() for the final Jaccard-based fold.

flowchart TD
  classDef base fill:#08131f,stroke:#1ea7ff,color:#dff6ff,stroke-width:1px;
  classDef accent fill:#0f2233,stroke:#ffd166,color:#fff4cc,stroke-width:1.5px;

  parents[Genome parent ids]:::base --> queue[Initial breadth-first queue]:::accent
  queue --> traversal[Bounded ancestor traversal]:::base
  traversal --> sets[Per-genome ancestor sets]:::base
  population[Population plus deterministic RNG]:::base --> pairs[Sampled genome pairs]:::accent
  pairs --> compare[Jaccard distance per pair]:::base
  sets --> compare
  compare --> mean[Mean ancestor uniqueness]:::accent

calculateMaxSamplePairs

calculateMaxSamplePairs(
  size: number,
): number

Compute the upper bound on sampled genome pairs.

The uniqueness metric is intentionally sampled rather than exhaustive. This helper keeps that budget honest by capping the requested sample count to both the true combinatorial maximum and the chapter's global runtime limit.

Parameters:

size - Population size.

Returns: Sample cap respecting both the combinatorial count and the global limit.

collectAncestorIds

collectAncestorIds(
  queueEntries: AncestorQueueEntry[],
  population: GenomeLike[],
): Set<number>

Collect ancestor IDs encountered within the configured depth window.

This is the core ancestry walk. It uses a queue-backed breadth-first pass so near ancestors are explored before deeper ones, which matches the meaning of the metric: recent family overlap should dominate the signal more than very old shared history.

The traversal stays bounded on purpose. Runtime lineage metrics do not need a full genealogy of the run; they need a stable recent-neighborhood view that remains cheap to recompute.

Parameters:

queueEntries - Breadth-first queue seeded with direct parents.
population - Current population for ID lookups.

Returns: Unique ancestor IDs encountered within the depth window.

computeAverageDistance

computeAverageDistance(
  distances: number[],
): number

Compute the mean ancestor distance across sampled pairs.

This is the final fold from many local comparisons into one controller-facing scalar. Rounding is intentional: the value is meant to be a stable telemetry and adaptive-policy signal rather than a high-precision scientific output.

Parameters:

distances - Pairwise Jaccard distances.

Returns: Mean distance rounded to the configured decimal precision.

computePairDistances

computePairDistances(
  pairs: GenomeIndexPair[],
  population: GenomeLike[],
  buildAncestorSet: (genome: GenomeLike) => Set<number>,
): number[]

Compute Jaccard distances for sampled genome pairs.

This stage turns sampled pair indices into actual lineage evidence. Each pair is resolved to two genomes, each genome is expanded into a shallow ancestor set, and those sets are compared using Jaccard distance so the final metric reflects overlap rather than raw ancestor counts.

Parameters:

pairs - Sampled index pairs.
population - Current population.
buildAncestorSet - Helper that builds ancestor sets for genomes.

Returns: Jaccard distances for valid pairs.

createInitialQueue

createInitialQueue(
  parentIds: number[],
  population: GenomeLike[],
): AncestorQueueEntry[]

Create the initial breadth-first queue from direct parent IDs.

This queue is the bridge between stored lineage metadata and the traversal loop. Depth starts at the direct-parent layer because lineage uniqueness is a recent-family metric first; later queue expansion can then walk outward while still respecting the bounded depth window.

Parameters:

parentIds - Direct parent IDs to seed the queue.
population - Current population for ID lookups.

Returns: Queue entries at depth 1.

hasMinimumPopulation

hasMinimumPopulation(
  size: number,
): boolean

Check whether the population is large enough to form a sampled pair.

This guard exists because ancestor uniqueness is defined over genome comparisons, not individual genomes. A population smaller than two can still have ancestry data, but it cannot produce a meaningful pairwise distance.

Parameters:

size - Population size.

Returns: true when at least two genomes exist.

normalizeParentIds

normalizeParentIds(
  value: GenomeLike,
): number[]

Normalize the parent ID list for a genome.

This helper is the first seam in the pipeline: it turns "maybe has lineage metadata" into a guaranteed array shape so the traversal code never needs to branch on missing parent storage.

Parameters:

value - Genome to read parents from.

Returns: Parent ID list, or an empty array when absent.

sampleGenomePairs

sampleGenomePairs(
  sampleCount: number,
  size: number,
  rngFactory: () => () => number,
): GenomeIndexPair[]

Sample genome index pairs for ancestor uniqueness.

The sampling policy chooses a bounded set of pair comparisons while keeping replay deterministic through the caller-supplied RNG factory. The result is not a canonical population ordering; it is a reproducible comparison budget for the current generation.

Parameters:

sampleCount - Number of pairs to sample.
size - Population size for index bounds.
rngFactory - RNG provider used to obtain a random function.

Returns: Array of sampled index pairs.