Ch 72. All 60 Systems at a Glance - Evolutionary AI Survey

Score8.42/10 — Draft

Method-Family Frequency

Of sixty-one systems, fifty-four (88.5%) carry at least one Ch67 formal method ID; the remaining seven are ES-Scale or Harness systems described entirely by auxiliary descriptors (see §72.9.5 for their methodological analysis). The table below shows how many systems use at least one method from each Ch67 family, plus the single most common method within that family.

Family	Systems	Share	Most common method
F1 · Mutation	52	85.2%	`M3` Reflect (40 systems, 65.6%)
F4 · Evaluation	21	34.4%	`E2` Sandbox (14 systems, 23.0%)
F7 · Bandits	9	14.8%	`B1` UCB1 (8 systems, 13.1%)
F5 · Population	7	11.5%	`P5` Skills archive (7 systems)
F3 · Selection	5	8.2%	`S2` Tournament (5 systems)
F6 · Islands	3	4.9%	`I1` Static (3 systems)
F2 · Crossover	0	0.0%	No Ch67 crossover tags in this corpus

Derivation check: F1 = 52 corresponds to the 54 Ch67-tagged systems minus the 2 without any F1 method (ALE-Bench and ALE-Agent AHC058, which carry only F4 tags). F4 = 21 systems carrying at least one of E2 (14 systems) or E3 (9 systems), with AutoHarness and BenchStack carrying both. F2 = 0: no system in Table 72.1 carries a formal Ch67 crossover tag (C1–C4); several FunSearch-lineage systems employ multi-parent LLM prompting, but the Ch67 taxonomy classifies this under M2 Rewrite rather than C-family methods. All family counts are multi-label: a system with both M1 and M3 increments both individual tallies but the F1 family count only once.

Individual Method Counts (top 8)

Method	Systems	Share	Where concentrated
`M3` Reflect	40	65.6%	All 24 P07 systems plus all 7 in P03 (= 31); the remaining 9 are distributed across P02 (1), P04 (3), P05 (4), and P08 (1).
`E2` Sandbox	14	23.0%	Research-Agent (7), Harness (3: AutoHarness, BenchStack, EvalForge), plus FARS, MetaHarness, AlphaEvolve, and ALE-Bench.
`M2` Rewrite	13	21.3%	FunSearch lineage (all 8) and Tree-Search (all 4); also EvoScale-Bench (ES-Scale).
`M6` Prompt	9	14.8%	Reflection (4: DiscoGen, AutoEvolver, FARS, Reflexion), Harness (3: AutoHarness, BenchStack, EvalForge), Research-Agent (PromptOpt), Self-Modifying (MetaHarness).
`E3` Multi-instance	9	14.8%	FunSearch (3: ShinkaEvolve, SkyDiscover, AlphaEvolve: Research Papers), Harness (3: ALE-Agent AHC058, AutoHarness, BenchStack), plus EvoScale-Bench, DiscoveryBench, SciReflect.
`B1` UCB1	8	13.1%	FunSearch (5: AlphaEvolve, ShinkaEvolve, A-Evolve, SkyDiscover, Darwinian Evolver), Tree-Search (2: AB-MCTS TreeQuest, CodeTree), Research-Agent (MathResearch).
`P5` Skills archive	7	11.5%	Five in P03 (4 Self-Modifying plus AutoEvolver) and two in P07 (AutoAgent, ResearchAgent).
`M1` Diff	6	9.8%	FunSearch-lineage exclusively: AlphaEvolve, OpenEvolve, ShinkaEvolve, SkyDiscover (P02) and Darwinian Evolver, AlphaEvolve: Research Papers (P05).

Figure 72.2 — Lineage × Method-Family Heatmap

Figure 72.2. Heatmap of dominant lineage (rows) versus Ch67 method family (columns). Cell values count the number of systems in that lineage using at least one method from the family. Bright green cells (= N) indicate full lineage coverage. Column sums verified: F1 = 26+8+3+5+5+4+1+0 = 52; F3 = 0+5+0+0+0+0+0+0 = 5; F4 = 8+4+4+2+1+0+1+1 = 21; F5 = 2+0+0+1+4+0+0+0 = 7; F7 = 1+5+0+0+0+3+0+0 = 9. The F2 Crossover column is uniformly zero — no system carries a formal Ch67 crossover tag, although several FunSearch-lineage systems use multi-parent LLM prompting classified under M2 Rewrite. FunSearch is the lineage with the richest evolutionary machinery, covering F3 Selection, F6 Islands, and F7 Bandits. Tree-Search also shows strong F7 coverage (3 of 4 systems). ES-Scale and Benchmark are nearly invisible because most of their systems use auxiliary descriptors.

§72.9.5 — Auxiliary Descriptor Analysis

The Ch67-only statistics above systematically understate the methodological richness of the seven auxiliary-descriptor-only systems: three ES-Scale (EGGROLL, Evolution Strategies at Scale, EvoX) and four Harness (7/24 Office, Hyperagents, Hyperspace, UlamAI). This subsection provides a parallel descriptor-based summary to complement the Ch67 analysis.

System	Lineage	Aux. Tag	Evolutionary / Methodological Approach
ES-Scale systems (3) — classical evolution strategies outside the LLM-evolutionary paradigm
EGGROLL	ES-Scale	`dist-ES`	Distributed gradient-free evolution strategies across GPU clusters; antithetic sampling, fitness shaping, variance reduction. Population size and migration topology are the primary tuning dimensions.
Evolution Strategies at Scale	ES-Scale	`neuro-ES`	Large-scale neuroevolution using CMA-ES variants with adaptive step-size control; natural gradient estimation for neural architecture search and reward shaping.
EvoX	ES-Scale	`meta-ES`	Meta-evolutionary outer loop that evolves the ES algorithm itself; auto-configuring mutation rates, crossover operators, and selection pressures via algorithm portfolio management.
Harness systems (4) — evaluation infrastructure with domain-specific methodology
7/24 Office	Harness	`eval-harness`	Multi-task evaluation framework for office automation agents; human-compatible task suites with temporal evaluation windows and multi-modal scoring rubrics.
Hyperagents	Harness	`multi-agent`	Coordination framework for evaluating emergent multi-agent strategies; agent interaction protocols, coalition metrics, and communication evaluation.
Hyperspace	Harness	`hyperopt`	Hyperparameter optimisation evaluation infrastructure; search space definition DSL, trial management, early stopping criteria, and parallelised evaluation orchestration.
UlamAI	Harness	`bench-infra`	Benchmark infrastructure for self-improving systems; reproducible evaluation pipelines, cross-system comparison metrics, and statistical significance testing.

Key observation. The ES-Scale systems employ population-level mechanisms (distributed gradient estimation, CMA-ES covariance adaptation, meta-evolutionary algorithm selection) that are structurally richer than a bare F1 Mutation count of 1 would suggest. Similarly, the four Harness systems implement sophisticated evaluation methodologies that are invisible to Ch67-only tallies. The descriptors capture genuine methodological choices rather than mere placeholders. A future edition of the Ch67 taxonomy could extend formal coverage to these domains, potentially adding families for classical ES primitives (F8·ES) and infrastructure patterns (F9·Infra).

Synthesis

Three structural patterns emerge from the data above. All claims below describe the sixty-one-system corpus curated for this survey (v2026-01-15) unless an external citation is provided; readers should not generalise directly to the broader field without additional evidence.

1. The Research-Agent concentration. Within this corpus, Research-Agent is the dominant lineage by a wide margin: 26 of 61 systems (42.6%), concentrated in P07 (23 systems) with early instances in P05 (3 systems). This distribution reflects a pronounced shift within the surveyed literature from "LLM evolves a priority function" (the FunSearch pattern) toward "LLM orchestrates an entire research pipeline." The FunSearch architecture was introduced by DeepMind in late 2023 (Nature, February 2024), and the FunSearch-lineage systems surveyed in P02 and P05 build on that template. The Research-Agent systems surveyed in P07 generally appeared later in the corpus window, with exemplars such as AI Scientist v2, DeepResearch Alibaba, and AIRA2 described in late 2024 through 2025 (see §72.9.6 for specific dates). The FunSearch, Reflection, and Self-Modifying lineages remain architecturally important but are increasingly embedded inside research-agent pipelines as sub-components rather than deployed as standalone frameworks — a pattern visible in the twelve hybrid systems, six of which are Research-Agent-dominant systems that borrow from FunSearch, Reflection, Self-Modifying, Tree-Search, or ES-Scale lineages (see Table 72.2 below).

2. Method convergence on M3 Reflect. A single mutation operator — M3 Reflect — appears in 40 of 61 systems (65.6%), making it the most prevalent component method by a factor of three over the next most common mutation method (M2 Rewrite at 13 systems) within this corpus. This dominance stems from Research-Agent and Self-Improving architectures, where the system reasons about failures before proposing changes: all 24 P07 systems and all 7 P03 systems use M3. By contrast, M1 Diff and M2 Rewrite — the canonical FunSearch mutation operators — appear in only 6 and 13 systems respectively (confined primarily to FunSearch and Tree-Search lineages in P02, P04, P05, and P06). No system in this corpus carries formal Ch67 crossover operators (C1–C4), though multi-parent prompting in FunSearch systems serves a functionally similar role; the Ch67 taxonomy classifies this under M2 rather than C-family methods. Selection, island, and bandit mechanisms remain concentrated in the FunSearch and Tree-Search lineages that originated them, as visible in the rightward columns of Figure 72.2.

3. Infrastructure divergence. Seven systems (11.5%) carry no Ch67 formal method IDs at all, relying entirely on auxiliary architectural descriptors. These are the three ES-Scale systems (EGGROLL, Evolution Strategies at Scale, EvoX) and four Harness systems (7/24 Office, Hyperagents, Hyperspace, UlamAI) — precisely the lineages least influenced by the LLM-evolutionary paradigm that Ch67 was designed to catalogue. As shown in §72.9.5, these systems employ rich methodological machinery that Ch67 does not capture. This gap highlights a limitation of the Ch67 taxonomy: it models LLM-evolutionary methods well but does not cover classical ES primitives or pure infrastructure patterns. Auxiliary descriptors fill this gap for the present survey; a future edition would benefit from extending the formal taxonomy to these domains.

§72.9.6 — Temporal Evidence

The temporal narrative in Synthesis claim 1 — that FunSearch-lineage systems characterise earlier corpus entries while Research-Agent systems characterise later ones — is supported by the publication dates documented in individual system chapters. The table below extracts approximate publication dates for representative systems from each lineage to make this claim directly auditable. Dates are drawn from the individual chapter introductions; where only a year is known, it is recorded as such.

System	Lineage	Part	Approx. publication	Source
Wave 1: Foundations (late 2023 – mid 2024)
FunSearch (precursor)	—	—	Dec 2023 / Feb 2024	Romera-Paredes et al., Nature
Reflexion	Reflection	P03	Oct 2023 (NeurIPS)	Shinn et al.
LATS	Tree-Search	P04	2023 / ICML 2024	Zhou et al.
Coscientist	Research-Agent	P07	Early 2024	Boiko et al., Nature
SWE-Agent	Research-Agent	P07	2024	Yang et al.
MLAgentBench	Research-Agent	P05	2024 (ICML)	Huang et al.
The AI Scientist	Research-Agent	P07	Aug 2024	Lu et al. (Sakana AI)
Wave 2: Diversification (mid 2024 – early 2025)
PaperQA2	Research-Agent	P07	2024	Lala et al. (FutureHouse)
AgentLaboratory	Research-Agent	P07	Late 2024	See Ch38
AI Scientist v2	Research-Agent	P07	Early 2025	Yamada et al. (Sakana AI)
CycleResearcher	Research-Agent	P07	2024–2025	See Ch47
Wave 3: Scale & consolidation (2025)
AlphaEvolve	FunSearch	P02	May 2025	Novikov et al. (Google DeepMind)
DeepResearch Alibaba	Research-Agent	P07	2025	See Ch48
AIRA2	Research-Agent	P07	2025	See Ch31
Bilevel AutoResearch	Research-Agent	P07	2025	See Ch37

This table shows representative systems only; all 61 system chapters document specific publication dates and venues in their introductions. The three-wave grouping is an interpretive overlay based on observed publication clustering; individual systems may straddle wave boundaries. Note that AlphaEvolve (Wave 3, May 2025) is a FunSearch-lineage system published later than many Research-Agent systems, demonstrating that the lineage shift is a statistical trend rather than a strict chronological replacement.

Evidentiary Support for Temporal and Comparative Claims

The table below maps each major temporal or comparative claim in the Synthesis to the specific systems and chapters that support it, enabling independent verification.

Claim	Supporting evidence	System/chapter references
FunSearch pattern characterised early corpus entries	Publication dates of FunSearch-lineage systems; see §72.9.6 Wave 1 and FunSearch precursor (Nature, 2024)	FunSearch precursor; A-Evolve, ShinkaEvolve, Darwinian Evolver (P02/P05, 2024); AlphaEvolve (P02, 2025)
Research-Agent pipeline characterised later corpus entries	Publication dates of P07 Research-Agent systems; see §72.9.6 Waves 2–3	AI Scientist v2, DeepResearch Alibaba, AIRA2, AgentLaboratory, Bilevel AutoResearch (P07, late 2024–2025)
`M3` Reflect is the corpus's most common variation operator	Direct count from Table 72.1: 40 / 61 = 65.6%	All 24 P07 + all 7 P03 + 9 across P02, P04, P05, P08; next most common mutation method is M2 at 13 systems (3.1× gap)
Earlier lineages are increasingly embedded as sub-components	12 hybrid systems carry secondary tags	AI Scientist v2 ^RE, DeepScientist ^RE, CycleResearcher ^RE, AutoAgent ^SM, Bilevel AutoResearch ^FS, DiscoveryBench ^FS, PaperQA2 ^TS, AIDE ^ES, AutoEvolver ^ME, FARS ^ME, AutoHarness ^RA, BenchStack ^RA
7 auxiliary-only systems are methodologically rich despite zero Ch67 IDs	Descriptor-level analysis in §72.9.5	EGGROLL `dist-ES`, Evolution Strategies at Scale `neuro-ES`, EvoX `meta-ES`, 7/24 Office `eval-harness`, Hyperagents `multi-agent`, Hyperspace `hyperopt`, UlamAI `bench-infra`

Limitations of this analysis

Several methodological choices constrain what the statistics above can and cannot show.

Dominant-tag-only counting masks hybridisation. Twelve of sixty-one systems (19.7%) carry a secondary-influence tag, meaning their architectures borrow significantly from a second lineage. Under the single-label counting protocol (§72.10), these hybrids contribute only to their dominant lineage's count. For example, AutoEvolver and FARS are counted as Reflection-dominant but both maintain MAP-Elites-style quality-diversity archives, and Bilevel AutoResearch is counted as Research-Agent-dominant despite incorporating FunSearch-style program synthesis. The dominant-only counts in §72.9 therefore overstate the purity of lineages and understate cross-lineage convergence. Readers interested in hybridisation should consult the secondary-influence column in Table 72.1 and the cross-system analysis in Ch70.

Auxiliary-descriptor-only systems are invisible to method-family statistics. The seven systems with no Ch67 tags do not appear in any non-zero cell of the heatmap (Figure 72.2) except where their lineage includes Ch67-tagged members. This means Harness and ES-Scale lineages appear methodologically impoverished in the formal statistics, when in fact they employ rich infrastructure patterns outside the Ch67 schema. Comparative claims about "method richness" across lineages should be interpreted as "Ch67-method richness" only. §72.9.5 provides a parallel descriptor-level analysis.

Absence of formal crossover. No system in this corpus carries a Ch67 crossover tag (C1–C4), yielding F2 = 0. This does not mean crossover-like operations are entirely absent: multi-parent LLM prompting in FunSearch systems and solution-combining strategies in some Tree-Search systems serve functionally analogous roles. However, the Ch67 taxonomy classifies these under M2 Rewrite rather than C-family methods, so they do not register as formal crossover. This classification decision affects the apparent absence of crossover in the corpus statistics.

Corpus selection effects. The sixty-one-system corpus was curated based on the inclusion criteria stated in the Methodology box above, with a freeze date of 15 January 2026. It is not an exhaustive census of all LLM-evolutionary or self-improving AI systems published in 2024–2025. The corpus over-represents systems with English-language publications, publicly described architectures, and sufficient technical detail for lineage classification. Any extrapolation from corpus-level frequencies to "field-wide" trends requires caution and external validation.

§72.10 — Counting Protocol and Classification Rules

This section formalises the classification and counting procedures used throughout §§72.1–72.9.

Definition 72.1 (Corpus). Let S = {s₁, …, s₆₁} be the set of system chapters surveyed in Parts P02–P08, frozen at corpus version v2026-01-15.

Definition 72.2 (Lineage set). Let L = {Research-Agent, FunSearch, Harness, Reflection, Self-Modifying, Tree-Search, ES-Scale, MAP-Elites, Benchmark} be the set of nine lineage tags. Seven are evolutionary architecture families; two (Harness, Benchmark) are special-purpose categories.

Definition 72.3 (Dominant lineage assignment). Each system s_i ∈ S is assigned exactly one dominant lineage d(s_i) ∈ L. The assignment function d : S → L follows a three-level priority order for hybrid systems:

Archive structure — if the system maintains a defining archive type (e.g., MAP-Elites grid with explicit behavioural descriptors, FunSearch-style island model), the lineage associated with that archive takes priority.
Search mechanism — if the archive is generic (flat pool) or absent, the primary search strategy (tree search, full-pipeline orchestration, classical ES) determines the lineage.
Variation operator — if both archive and search are generic, the dominant variation operator (reflection, diff mutation, self-modification) determines the lineage.

Special-purpose categories override: if a system is primarily evaluation infrastructure, it receives Harness; if it is primarily a benchmark suite, it receives Benchmark.

Definition 72.4 (Secondary influence). A system s_i may additionally carry zero or more secondary-influence tags Sec(s_i) ⊂ L \ {d(s_i)}, indicating architecturally significant borrowing from other lineages that does not override the dominant classification. In this corpus, twelve systems carry exactly one secondary tag; the remaining forty-nine carry none.

Definition 72.5 (Lineage count). For any lineage ℓ ∈ L, the dominant count is Count(ℓ) = |{s_i ∈ S : d(s_i) = ℓ}|. This single-label protocol ensures Σ_ℓ∈L Count(ℓ) = |S| = 61 with no double-counting. Secondary counts are tallied separately: SecCount(ℓ) = |{s_i ∈ S : ℓ ∈ Sec(s_i)}|.

Definition 72.6 (Method count). Let M(s_i) denote the set of Ch67 method IDs assigned to system s_i. Method assignment is multi-label: a system using M1, M3, and B1 increments all three individual counts and two family counts (F1 once for both M1 and M3, F7 once for B1). Family count for family f: FamCount(f) = |{s_i ∈ S : M(s_i) ∩ f ≠ ∅}|. Seven systems have M(s_i) = ∅ (auxiliary-only).

Lineage Classification Decision Procedure

The following pseudocode implements Definition 72.3. This is a formalisation of the curator's decision process; individual chapter authors may note edge cases in their system chapters.

def classify_dominant_lineage(system: SystemProfile) -> str:
    """Assign dominant lineage per Definition 72.3.

    Priority: special-purpose > archive > search > variation.
    Returns one of the nine lineage tags in L.
    """
    # Special-purpose overrides
    if system.is_evaluation_infrastructure:
        return "Harness"
    if system.is_benchmark_suite:
        return "Benchmark"

    # Level 1: Archive structure
    if system.archive_type == "MAP-Elites" and system.uses_behavioral_descriptors:
        return "MAP-Elites"
    if system.archive_type == "island-model" or system.is_funsearch_variant:
        return "FunSearch"

    # Level 2: Search mechanism
    if system.orchestrates_full_research_pipeline:
        return "Research-Agent"
    if system.uses_tree_search_with_backtracking:
        return "Tree-Search"
    if system.uses_classical_evolution_strategies:
        return "ES-Scale"

    # Level 3: Variation operator
    if system.modifies_own_source_code:
        return "Self-Modifying"
    if system.primary_variation == "reflection":
        return "Reflection"

    raise ClassificationError(f"No lineage matched for {system.name}")

Worked Classification Examples

The following seven examples demonstrate how the priority rule was applied to ambiguous or hybrid systems. Each example traces the decision path through the three levels, showing exactly why the dominant lineage was chosen and why the secondary influence was assigned.

Example 72.1 — Bilevel AutoResearch (Dominant: Research-Agent; Secondary: FunSearch)

Special-purpose? No — it is a generative research system, not evaluation infrastructure or a benchmark suite.
Level 1 — Archive: The system maintains a candidate pool but does not use a MAP-Elites grid with behavioural descriptors, nor a FunSearch-style island model. The pool is a flat buffer of research hypotheses. → Archive does not determine lineage.
Level 2 — Search: The outer loop orchestrates a full research pipeline — hypothesis generation, experimental design, code execution, result analysis, and iterative refinement. This matches orchestrates_full_research_pipeline. → Research-Agent.
Secondary: The inner loop uses FunSearch-style program synthesis to generate and evolve candidate functions within each experiment. This is architecturally significant but does not override the outer-loop classification. → FunSearch secondary.

Example 72.2 — AutoEvolver (Dominant: Reflection; Secondary: MAP-Elites)

Special-purpose? No.
Level 1 — Archive: The system maintains a quality-diversity archive that resembles a MAP-Elites grid — solutions are stored across a diversity of behavioural niches. However, the archive does not use explicit behavioural descriptors with a discretised grid. Instead, diversity is maintained through a softer mechanism (feature-based clustering without predefined bins). Because uses_behavioral_descriptors is false, the MAP-Elites condition does not fire. → Archive does not determine lineage.
Level 2 — Search: Not a research pipeline, tree search, or classical ES. → Falls to Level 3.
Level 3 — Variation: The primary variation mechanism is reflective self-critique. primary_variation == "reflection" is true. → Reflection.
Secondary: The QD-archive pattern borrows from the MAP-Elites paradigm. → MAP-Elites secondary.

Example 72.3 — FARS (Dominant: Reflection; Secondary: MAP-Elites)

Special-purpose? No.
Level 1 — Archive: Like AutoEvolver, FARS maintains a feature-aware diversity archive with continuous feature vectors rather than a fixed MAP-Elites grid. → Archive does not trigger MAP-Elites.
Level 2 — Search: Not a full research pipeline, tree search, or classical ES. → Falls to Level 3.
Level 3 — Variation: Feature-aware reflective search. → Reflection.
Secondary: The feature-indexed archive is conceptually derived from quality-diversity methods. → MAP-Elites secondary.

Example 72.4 — ALE-Agent AHC058 (Dominant: Harness; No secondary)

Special-purpose? Yes — the system is primarily evaluation infrastructure for AtCoder Heuristic Contest problems. The special-purpose override fires immediately. → Harness.
No further levels are evaluated. The system uses E3 Multi-instance evaluation, consistent with its harness role.
Secondary: None.

Example 72.5 — MetaHarness (Dominant: Self-Modifying; No secondary)

Special-purpose? The system is related to evaluation, but its distinguishing feature is that it adapts its own test suite architecture during operation. The Harness override does not fire because the system's primary contribution is self-modification, not evaluation service.
Levels 1–2: No defining archive, no research pipeline / tree search / classical ES. → Falls to Level 3.
Level 3 — Variation: The system modifies its own source code. → Self-Modifying.
Note: MetaHarness is in P08 because its topic is evaluation infrastructure, but its lineage is Self-Modifying because its architecture is self-modifying. Part assignment reflects topic; lineage reflects architecture.

Example 72.6 — EvoScale-Bench (Dominant: ES-Scale; No secondary)

Special-purpose? The system bridges ES evaluation with benchmark methodology. Neither the Benchmark nor Harness override fires because its primary contribution is ES methodology.
Level 1 — Archive: No MAP-Elites grid or island model. → Falls through.
Level 2 — Search: Uses classical evolution strategies as its core methodology. → ES-Scale.
Secondary: None.

Example 72.7 — AutoAgent (Dominant: Research-Agent; Secondary: Self-Modifying)

Special-purpose? No.
Level 1 — Archive: Maintains a skills archive (P5), but this is a flat skill library, not a MAP-Elites grid or island model. → Does not determine lineage.
Level 2 — Search: Orchestrates a full research pipeline. → Research-Agent.
Secondary: The system can modify its own agent architecture during operation. → Self-Modifying secondary.
Tie-break note: Under Level 2, full-pipeline orchestration takes priority over self-modification capability.

Chapter Summary

Key takeaway. Sixty-one system chapters collapse into nine lineage tags across seven evolutionary architecture families plus two special-purpose categories (Harness and Benchmark). Within this corpus (v2026-01-15), Research-Agent is the dominant lineage: 26 of 61 systems (42.6%). FunSearch and Harness each account for 8 (13.1%). The remaining lineages — Reflection, Self-Modifying, Tree-Search, and ES-Scale — collectively represent 18 systems whose architectures are increasingly embedded as sub-components within Research-Agent pipelines, as evidenced by the 12 hybrid systems carrying secondary-influence tags. At the method level, M3 Reflect appears in 40 systems (65.6%), confirming reflection-based mutation as the corpus's most common variation operator. Seven auxiliary-descriptor-only systems (three ES-Scale, four Harness) employ rich methodological machinery outside the Ch67 taxonomy, as detailed in §72.9.5.

How to use this chapter. As a lookup index. Scan Table 72.1 for the compact "at a glance" view. Scan lineages to find architectural peers; scan Ch67 method tags to find systems sharing a specific component technique; scan auxiliary tags for infrastructure choices; scan one-line purposes for the field landscape. Every row links to the full system chapter, and every Ch67 method tag is formally defined in Ch67.

Methodological note. Lineage tags are single-label (dominant only) per the priority-based classification in §72.10; twelve hybrid systems additionally carry one secondary-influence tag. Seven worked classification examples in §72.10 demonstrate how the priority rule handles ambiguous cases including Bilevel AutoResearch, AutoEvolver, FARS, ALE-Agent AHC058, MetaHarness, EvoScale-Bench, and AutoAgent. Method assignments are multi-label, sourced from Ch67 formal IDs. Seven auxiliary-descriptor-only systems are invisible to Ch67 method-family statistics — §72.9.5 provides a parallel descriptor-based analysis. All counts, percentages, and patterns reported in §72.9 are derived from the structured corpus dataset (Table 72.1) and describe this curated sixty-one-system corpus only. The temporal evidence table in §72.9.6 provides directly auditable publication dates supporting the FunSearch-to-Research-Agent shift narrative.

@misc{kinas2026evosurvey,
  author = {Kinas, Remek},
  title  = {Evolutionary AI Survey},
  year   = {2026},
  url    = {https://evo.si5.pl}
}