← Back to Index

SkyDiscover & AdaEvolve

A Modular Framework for AI-Driven Algorithmic Discovery with Hierarchical Adaptive Search Organization: UC Berkeley Sky Lab Published: February 2026 Type: Framework + Research Paper (arXiv:2602.20133) License: Apache 2.0 Report Type: PhD-Level Technical Analysis Report Date: March 2026

Table of Contents

1 Full Title and Attribution

Framework: SkyDiscover: A Modular Framework for AI-Driven Scientific and Algorithmic Discovery

Paper: AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization (arXiv:2602.20133)

Project Page: skydiscover-ai.github.io

Repository: github.com/skydiscover-ai/skydiscover

License: Apache License 2.0

Organization: UC Berkeley Sky Lab

Publication Date: February 23, 2026

Lineage: Builds on FunSearch, OpenEvolve, GEPA, ShinkaEvolve; extends with hierarchical adaptive search

2 Authors and Team

SkyDiscover and AdaEvolve were developed at the UC Berkeley Sky Lab, a research group known for systems-level AI infrastructure (Ray, Spark, Alpa, vLLM).

Authors: Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, Ion Stoica The team brings together expertise in systems optimization (Stoica, Zaharia — creators of Spark and Ray), program analysis (Sen — creator of KLEE), and machine learning (Dimakis). This systems-first DNA is reflected in the framework's emphasis on modular design, fair benchmarking, and real-world systems optimization tasks.

Notably, several authors overlap with the GEPA team (UC Berkeley / Stanford), and the framework includes GEPA as a first-class backend algorithm. SkyDiscover represents the Berkeley lab's effort to unify the fragmented landscape of LLM-driven evolutionary search into a single, fair evaluation platform.

3 Core Contribution

[!important] Key Novelty SkyDiscover makes two distinct contributions: (1) a modular framework providing a unified interface for implementing, running, and fairly comparing discovery algorithms across 200+ optimization tasks; and (2) AdaEvolve, a novel three-level hierarchical adaptation algorithm that replaces static search schedules with dynamic resource allocation coordinated by an accumulated improvement signal.

What Makes SkyDiscover/AdaEvolve Novel

  1. Hierarchical adaptive search: AdaEvolve is the first evolutionary search algorithm to implement three-level adaptation: local (within-island exploration intensity), global (cross-island resource allocation via bandit), and meta (tactical paradigm shifts when stagnation detected). All three levels are coordinated by a single unified signal.
  2. Accumulated improvement signal: A scale-invariant, exponential moving average of squared improvement magnitudes that serves as a real-time volatility metric. This single signal drives decisions at all three adaptation levels — a mathematically elegant unification.
  3. Globally-normalized bandit rewards: Unlike previous systems that measure island improvement relative to each island's local best, AdaEvolve normalizes rewards against the global best, preventing "poor island bias" where weak islands receive disproportionate resources for trivial improvements.
  4. Meta-guidance via LLM tactical generation: When stagnation is detected across all islands, the system triggers a meta-level LLM analysis that generates high-level algorithmic directives (e.g., "switch from greedy to dynamic programming"), forcing qualitative search redirection.
  5. Unified benchmarking platform: SkyDiscover provides 200+ optimization tasks spanning mathematics, systems optimization, competitive programming, and creative applications, enabling fair head-to-head comparison of different search algorithms.
  6. Multi-algorithm framework: Ships with AdaEvolve, EvoX, and native backends for OpenEvolve, GEPA, and ShinkaEvolve, plus generic strategies (Top-K, Beam Search, Best-of-N).

Relationship to Prior Work

System Year Adaptation Search Strategy Benchmark Coverage
FunSearch 2023 Static Single population Math only
OpenEvolve 2025 Static islands MAP-Elites + islands ~10 tasks
ShinkaEvolve 2025 Bandit LLM selection Islands + dynamic spawning ~20 tasks
GEPA 2026 Pareto-based Reflection-driven ~30 tasks
SkyDiscover 2026 Three-level hierarchical Multi-algorithm (AdaEvolve, EvoX, + backends) 200+ tasks

4 Supported Solutions

4.1 Solution Types

Solution Type Description Supported
Function-level optimization Evolve a single function for a given task ✅ Yes
Full program evolution Evolve complete programs with EVOLVE-BLOCK markers ✅ Yes
Multi-file codebase Modify multiple files simultaneously ✅ Yes (agentic mode)
Prompt optimization Evolve NLP prompts for downstream tasks ✅ Yes (HotPotQA)
Image generation Evolve image generation parameters/code ✅ Yes (creative tasks)
Systems optimization Cloud scheduling, load balancing, kernel tuning ✅ Yes (9 tasks)
Self-modification Agent modifying its own code ❌ No

4.2 Benchmark Portfolio (200+ Tasks)

Domain Tasks Examples
Mathematics 14 Circle packing, Erdős problems, Heilbronn triangles, geometric optimization
Systems 9 Cloud scheduling, load balancing, MoE expert placement, GPU kernel optimization
Algorithms (Frontier-CS) 172 Competitive programming from diverse categories
Algorithms (ALE) 10 AtCoder Heuristic Contest-derived optimization
Creative / NLP 2+ Image generation evolution, HotPotQA prompt optimization

5 LLM Integration

5.1 Multi-Provider Support

SkyDiscover supports all major LLM providers through a unified provider/model format:

Provider Models Format
OpenAI GPT-5, GPT-4o, o3-mini openai/gpt-5 (default provider)
Google Gemini 3 Pro, Gemini 3 Flash gemini/gemini-3-pro
Anthropic Claude Opus 4.6, Claude Sonnet 4.6 anthropic/claude-opus-4-6
Local (Ollama) Any LiteLLM-compatible model ollama/qwen2.5-coder:32b

5.2 Weighted Multi-Model Pools

A distinctive feature of SkyDiscover is weighted multi-model pools for distributed sampling. Instead of using a single LLM per mutation, the framework samples from a weighted mixture:

# YAML configuration for weighted model pools
llm:
  models:
    - model: "openai/gpt-5"
      weight: 0.4
    - model: "gemini/gemini-3-pro"
      weight: 0.3
    - model: "anthropic/claude-sonnet-4-6"
      weight: 0.2
    - model: "ollama/qwen2.5-coder:32b"
      weight: 0.1
  system_prompt: "You are an expert algorithm designer..."
  max_iterations: 500

5.3 Agentic Mode

SkyDiscover provides an agentic mode where the LLM has access to the full project file structure during mutation. This enables context-aware mutations that consider imports, dependencies, and cross-file interactions — conceptually similar to Arcgentica's runtime-as-context but at the file system level rather than the REPL level.

5.4 Custom System Prompts

Each task can define custom system prompts that are injected into the LLM context during mutation. Combined with AdaEvolve's meta-guidance system, prompts can be dynamically augmented with tactical directives when the system detects stagnation.

6 Key Results

[!success] Headline Results AdaEvolve achieves ~34% median improvement over OpenEvolve/GEPA/ShinkaEvolve baselines across ~200 benchmarks, matches AlphaEvolve on 6/6 systems tasks and 6/8 math tasks, and demonstrates strong real-world systems impact.

6.1 Mathematical Optimization (6 tasks, 100 iterations)

Problem AdaEvolve OpenEvolve GEPA Human SOTA
Circle Packing (Square) 2.636 2.590 2.610 2.634
Heilbronn Triangles 0.036 0.028 0.031
Signal Processing 0.718 0.619 0.682

AdaEvolve matches or exceeds human SOTA on circle packing (2.636 ≥ 2.634) and achieves best open-source results across all math benchmarks.

6.2 Real-World Systems Optimization (ADRS, 7 tasks)

Task AdaEvolve (GPT-5) AdaEvolve (Gemini-3-Pro) Best Baseline
Cloud Transfer Cost 41% lower than baselines 41% lower than baselines OpenEvolve
GPU Load Balancing 14% better than baselines 14% better than baselines GEPA
MoE Expert Placement Best on all 7 Best on all 7 ShinkaEvolve

Wins on all 7 ADRS benchmarks under both GPT-5 and Gemini-3-Pro. Largest gains on sparse/bursty improvement tasks (TXN: 4348 vs baseline 4329), where adaptive resource allocation excels.

6.3 Frontier-CS (172 algorithm design problems, 50 LLM calls)

Metric AdaEvolve OpenEvolve GEPA Single-call GPT-5
Mean Score 61.33 50.75 54.20 20.64
Median Score 75.15 56.37 60.12 15.30
Improvement over OpenEvolve +21% (mean score)

6.4 Ablation Study

Configuration Circle Packing Signal Processing
Full AdaEvolve 2.6294 ± 0.003 0.7178 ± 0.019
w/o Local Adaptation 2.5906 ± 0.048 0.6807 ± 0.021
w/o Bandit Selection 2.6180 ± 0.005 0.6190 ± 0.054
w/o Meta-Guidance 2.5213 ± 0.028 0.5476 ± 0.011

[!note] Key Finding Meta-Guidance removal causes the largest performance degradation across all benchmarks. This validates that adaptive tactical generation — injecting high-level algorithmic directives when search stagnates — is AdaEvolve's most impactful innovation.

7 Reproducibility

Criterion Status Details
Source Code ✅ Available github.com/skydiscover-ai/skydiscover (Apache 2.0)
Paper ✅ Available arXiv:2602.20133 (CC BY 4.0)
Benchmarks ✅ 200+ included All benchmarks ship with the framework; evaluator functions provided
Configuration ✅ YAML-based Exact configs for all experiments are in the repository
Checkpoint Resumption ✅ Yes Long-running discovery tasks can be interrupted and resumed
Determinism ⚠️ Partial LLM outputs are stochastic; framework tracks seeds where possible
API Key Requirement ⚠️ Required Needs at least one LLM provider API key (or local Ollama)

8 Compute and API Costs

8.1 Minimal Configuration

AdaEvolve is designed for minimal configuration: it requires only a model name and an iteration budget. The three-level adaptation handles all scheduling decisions automatically, eliminating the need to hand-tune island counts, mutation ratios, or exploration-exploitation schedules.

8.2 Cost Efficiency

Setting LLM Calls Estimated Cost Benchmark
Frontier-CS (172 problems) 50 per problem $0.10–0.50 per problem Mean score 61.33
Math optimization (100 iter) 100 $5–30 Matches human SOTA
Systems optimization (ADRS) 50–200 $2–20 41% cost reduction

8.3 Cost Advantage from Adaptive Allocation

AdaEvolve's hierarchical adaptation naturally reduces costs by allocating more resources to productive islands and fewer to stagnant ones. The globally-normalized bandit prevents wasting compute on low-performing search fronts. Compared to static schedule systems:

  • ~20–30% fewer wasted LLM calls vs. fixed island allocation (OpenEvolve)
  • Dynamic island spawning/pruning prevents maintaining inactive search fronts
  • Meta-guidance breaks through stagnation plateaus that would otherwise waste remaining budget

9 Architecture Solution

9.1 Framework Architecture

SkyDiscover has a clean modular architecture separating the framework layer (task definitions, evaluation, configuration, monitoring) from the search algorithm layer (AdaEvolve, EvoX, or any backend):

╔════════════════════════════════════════════════════════════════════════════╗
║                            S K Y D I S C O V E R                          ║
║            Modular Framework for AI-Driven Algorithmic Discovery           ║
╚════════════════════════════════════════════════════════════════════════════╝

                          ┌──────────────────────────┐
                          │     USER INTERFACE        │
                          │                          │
                          │  YAML Config             │
                          │  Live Dashboard          │
                          │  Human Feedback Steering │
                          │  Checkpoint Resume       │
                          └────────────┬─────────────┘
                                       │
                  ┌────────────────────┼────────────────────┐
                  │                    │                    │
        ┌─────────▼─────────┐ ┌──────▼──────────┐ ┌──────▼──────────┐
        │  EVALUATOR API    │ │  SEARCH ALGO    │ │  LLM PROVIDER   │
        │                   │ │  ROUTER         │ │  LAYER          │
        │  evaluate(path)   │ │                 │ │                 │
        │  → combined_score │ │  AdaEvolve      │ │  OpenAI         │
        │  → artifacts      │ │  EvoX           │ │  Gemini         │
        │  EVOLVE-BLOCK     │ │  Top-K          │ │  Anthropic      │
        │  markers          │ │  Beam Search    │ │  Ollama/local   │
        └─────────┬─────────┘ │  Best-of-N      │ │  Weighted pools │
                  │           │  GEPA Native    │ └────────┬────────┘
                  │           │  OpenEvolve     │          │
                  │           └──────┬──────────┘          │
                  │                  │                      │
                  └──────────────────┼──────────────────────┘
                                     │
                          ┌──────────▼──────────────┐
                          │    BENCHMARK SUITE       │
                          │                          │
                          │  Math (14 tasks)         │
                          │  Systems (9 tasks)       │
                          │  Frontier-CS (172 tasks) │
                          │  ALE (10 tasks)          │
                          │  Creative/NLP (2+ tasks) │
                          └──────────────────────────┘

9.2 AdaEvolve Algorithm Architecture

AdaEvolve implements a multi-island evolutionary search with three-level hierarchical adaptation:

╔══════════════════════════════════════════════════════════════════════╗
║                         A D A E V O L V E                           ║
║            Three-Level Hierarchical Adaptive Search                  ║
╚══════════════════════════════════════════════════════════════════════╝

  LEVEL 3: META-GUIDANCE (triggered on global stagnation)
  ┌──────────────────────────────────────────────────────────────────┐
  │  IF G_t^(k) ≤ 0.12 for all k:                                   │
  │    → LLM analyzes evaluator code + current best programs         │
  │    → Generates high-level tactical directives                    │
  │    → Injects tactics into mutation prompts                       │
  │    → Forces qualitative search paradigm shift                    │
  │  Examples: "Switch from greedy to DP", "Use Voronoi init"       │
  └──────────────────────────────────┬───────────────────────────────┘
                                     │ tactical injection
  LEVEL 2: GLOBAL ADAPTATION (cross-island resource allocation)
  ┌──────────────────────────────────▼───────────────────────────────┐
  │  UCB Island Selection: k* = argmax[R_k/V_k + C√(ln N/n_k)]     │
  │                                                                   │
  │  Globally-normalized rewards: r_t^(k) = (f' - f_k*) / f_global* │
  │  Decayed tracking: R_t^(k) = ρ·R_{t-1}^(k) + r_t^(k)          │
  │                                                                   │
  │  Dynamic island spawning: when G_t^(k) ≤ 0.02 across all k     │
  └──────────────────────────────────┬───────────────────────────────┘
                                     │ resource allocation
  LEVEL 1: LOCAL ADAPTATION (within-island exploration intensity)
  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
  │  Island 1  │  │  Island 2  │  │  Island 3  │  │  Island K  │
  │            │  │            │  │            │  │            │
  │  I_t^(1)   │  │  I_t^(2)   │  │  I_t^(3)   │  │  I_t^(K)   │
  │  G_t^(1)   │  │  G_t^(2)   │  │  G_t^(3)   │  │  G_t^(K)   │
  │            │  │            │  │            │  │            │
  │ ┌────────┐ │  │ ┌────────┐ │  │ ┌────────┐ │  │ ┌────────┐ │
  │ │ Accum. │ │  │ │ Accum. │ │  │ │ Accum. │ │  │ │ Accum. │ │
  │ │ Improv.│ │  │ │ Improv.│ │  │ │ Improv.│ │  │ │ Improv.│ │
  │ │ Signal │ │  │ │ Signal │ │  │ │ Signal │ │  │ │ Signal │ │
  │ └────────┘ │  │ └────────┘ │  │ └────────┘ │  │ └────────┘ │
  └────────────┘  └────────────┘  └────────────┘  └────────────┘
       │                │                │                │
       └────────────────┼────────────────┼────────────────┘
                        │    inter-island migration
                        ▼
              ┌──────────────────────┐
              │  POPULATION DATABASE  │
              │  Per-island archives  │
              │  Global best tracking │
              └──────────────────────┘

10 Component Breakdown

10.1 Framework Components (SkyDiscover)

Component Module Responsibility
Evaluator API evaluate(program_path) Returns {combined_score, artifacts}; artifacts feed into next iteration
Search Router skydiscover/search/ Dispatches to AdaEvolve, EvoX, Top-K, Beam, Best-of-N, or native backends
Context Builder skydiscover/context_builder/ Assembles LLM prompt from code, artifacts, and search state
LLM Provider Layer YAML config Weighted multi-model pools, provider routing
Benchmark Suite benchmarks/ 200+ tasks with evaluators, initial programs, documentation
Live Dashboard Web UI Real-time scatter plots, code diffs, metric tracking, human intervention
Checkpoint System Built-in Full state serialization for resuming interrupted runs

10.2 Algorithm Components (AdaEvolve)

Component Role Key Parameter
Accumulated Improvement Signal Per-island volatility metric driving all three adaptation levels ρ (decay rate, default 0.9)
Local Adaptation Engine Adjusts exploration intensity per island per iteration I_min=0.1, I_max=0.7
Global UCB Bandit Allocates compute across islands via UCB selection C=√2
Meta-Guidance Generator LLM-driven tactical directive generation on stagnation Trigger: G ≤ 0.12
Island Manager Dynamic spawning when all islands stagnate Trigger: G ≤ 0.02
Migration Controller Inter-island solution transfer Configurable topology

10.3 EvoX Algorithm (Secondary)

SkyDiscover also includes EvoX, a self-evolving optimization strategy where the system co-adapts both solution generation and experience management using LLM-driven strategy evolution during runtime. EvoX provides a complementary approach to AdaEvolve's hierarchical adaptation, focusing on evolving the search strategy itself rather than parameterizing a fixed strategy.

11 Core Mechanisms (Detailed)

11.1 The Accumulated Improvement Signal

The foundation of AdaEvolve is a single, unified signal that measures the volatility of improvement on each island. This signal coordinates all three adaptation levels:

Step 1: Normalized Improvement Magnitude

After each mutation evaluation on island k:

$$ \delta_t^{(k)} = \max\left(\frac{f' - f_k^}{f_k^}, 0\right) $$

(Eq. 1 — Normalized Improvement)

Where f' is the new program's score and f_k* is the best score on island k. Normalization makes the signal scale-invariant across different problem types.

Step 2: Exponential Moving Average of Squared Improvements

$$ G_t^{(k)} = \rho \cdot G_{t-1}^{(k)} + (1 - \rho) \cdot (\delta_t^{(k)})^2 $$

(Eq. 2 — Accumulated Improvement Signal)

With ρ = 0.9 (decay rate). High G_t^(k) indicates a productive trajectory (recent large improvements). Low G_t^(k) signals stagnation requiring intervention. The squaring amplifies large breakthroughs and suppresses noise from trivial changes.

class AccumulatedImprovementSignal:
    """Per-island volatility metric that drives all three adaptation levels."""

    def __init__(self, rho: float = 0.9, epsilon: float = 1e-8):
        self.rho = rho
        self.epsilon = epsilon
        self.G = 0.0  # accumulated signal
        self.best_score = float('-inf')

    def update(self, new_score: float) -> float:
        """Update signal after evaluating a new program on this island."""
        if self.best_score > float('-inf'):
            delta = max((new_score - self.best_score) / (abs(self.best_score) + self.epsilon), 0.0)
        else:
            delta = 0.0

        self.G = self.rho * self.G + (1 - self.rho) * delta ** 2

        if new_score > self.best_score:
            self.best_score = new_score

        return self.G

11.2 Level 1: Local Adaptation (Exploration Intensity)

Each island dynamically adjusts its exploration intensity based on its accumulated improvement signal:

$$ I_t^{(k)} = I_{\min} + \frac{I_{\max} - I_{\min}}{1 + \sqrt{G_t^{(k)} + \epsilon}} $$

(Eq. 3 — Dynamic Exploration Intensity)

Where I_min=0.1, I_max=0.7. When G is high (productive trajectory), intensity is low → exploitation-dominant sampling from top-ranked parents. When G is low (stagnation), intensity rises → exploration-dominant sampling, more random parent selection, and broader mutation scope.

[!info] Interpretation Exploration intensity controls the balance between refining known good solutions (low I = focused diff patches from best parents) and trying creative leaps (high I = full rewrites from diverse parents). This is analogous to simulated annealing's temperature, but driven by observed improvement rather than a fixed cooling schedule.

def compute_exploration_intensity(G: float, I_min: float = 0.1, I_max: float = 0.7,
                                   epsilon: float = 1e-8) -> float:
    """Compute exploration intensity from accumulated improvement signal."""
    return I_min + (I_max - I_min) / (1 + (G + epsilon) ** 0.5)

11.3 Level 2: Global Adaptation (Cross-Island Resource Allocation)

Globally-Normalized Bandit Rewards

AdaEvolve's key insight: measuring improvement relative to each island's local best creates "poor island bias." Instead, rewards are normalized against the global best:

$$ r_t^{(k)} = \frac{f' - f_k^}{f_{\text{global}}^} $$

(Eq. 4 — Globally-Normalized Reward)

This prevents weak islands from receiving outsized credit for trivial refinements. An island improving from score 10 to 12 gets much less reward than one improving from 90 to 92 when f_global* = 95.

Decayed Cumulative Tracking

$$ R_t^{(k)} = \rho \cdot R_{t-1}^{(k)} + r_t^{(k)} $$ $$ V_t^{(k)} = \rho \cdot V_{t-1}^{(k)} + 1 $$

(Eq. 5 — Decayed Cumulative Tracking)

Exponential decay ensures stale early breakthroughs don't dominate future allocation. The visit count V_t^(k) is also decayed, ensuring UCB exploration bonuses reflect recent neglect, not total neglect.

UCB Island Selection

$$ k^* = \arg\max_k \left[\frac{R_k}{V_k} + C\sqrt{\frac{\ln N}{n_k}}\right] $$

(Eq. 6 — UCB Island Selection)

With C = √2. The first term (R_k/V_k) is exploitation (favor recently productive islands), the second is exploration (favor recently neglected islands).

class GlobalAdaptationBandit:
    """UCB bandit for cross-island resource allocation with global normalization."""

    def __init__(self, n_islands: int, rho: float = 0.9, C: float = 1.414):
        self.n_islands = n_islands
        self.rho = rho
        self.C = C
        self.R = np.zeros(n_islands)  # decayed cumulative reward
        self.V = np.zeros(n_islands)  # decayed visit count
        self.total_visits = 0
        self.global_best = float('-inf')

    def select_island(self) -> int:
        """Select next island to allocate compute to via UCB."""
        self.total_visits += 1
        ucb_values = np.zeros(self.n_islands)
        for k in range(self.n_islands):
            if self.V[k] < 1e-8:
                ucb_values[k] = float('inf')  # force exploration of unvisited
            else:
                exploit = self.R[k] / self.V[k]
                explore = self.C * np.sqrt(np.log(self.total_visits) / self.V[k])
                ucb_values[k] = exploit + explore
        return int(np.argmax(ucb_values))

    def update(self, island_k: int, new_score: float, island_best: float):
        """Update bandit state with globally-normalized reward."""
        if new_score > self.global_best:
            self.global_best = new_score
        reward = (new_score - island_best) / (abs(self.global_best) + 1e-8)
        self.R[island_k] = self.rho * self.R[island_k] + reward
        self.V[island_k] = self.rho * self.V[island_k] + 1

11.4 Level 3: Meta-Guidance (Tactical Generation)

When the accumulated improvement signal drops below a threshold across all islands simultaneously, AdaEvolve triggers meta-level LLM analysis:

[!important] Meta-Guidance Trigger Trigger condition: G_t^(k) ≤ 0.12 for all k Action: Invoke meta-LLM to analyze the evaluator code, identify bottlenecks in the current best program, and propose fundamentally different algorithmic approaches — not incremental improvements, but paradigm shifts.

META_GUIDANCE_PROMPT = """
You are analyzing a stalled optimization process. The evolutionary search has failed
to make progress across all islands for the last several iterations.

## Evaluator Code
```python
{evaluator_code}

Current Best Program (score: {best_score})

{best_program}

Recent Failed Approaches (last 10 mutations)

{failed_approaches}

Task

Propose 2-3 fundamentally different algorithmic approaches. Do NOT suggest incremental improvements. Instead, suggest: 1. A completely different algorithm class (e.g., switching from greedy to DP, from brute force to divide-and-conquer) 2. Concrete techniques with specific library functions (e.g., scipy.optimize.linear_sum_assignment) 3. A novel data structure that changes the problem's complexity class

Output each tactic as: TACTIC: [name] — [concrete description with code hints] """

**Example tactics generated in practice:**

- "Trust-region root finding for faster convergence on constrained optimization"
- "Voronoi-based initialization for better spatial coverage in packing problems"
- "Median filtering + linear sum assignment via scipy for robust matching"
- "Replace recursive DFS with iterative BFS + priority queue for better cache locality"

These tactics are **injected into subsequent mutation prompts** on all islands, forcing the entire search to shift direction. The tactic injection persists for a configurable window (default: 20 iterations) before the system re-evaluates stagnation.

### 11.5 Dynamic Island Spawning

A more extreme intervention than meta-guidance: when G_t^(k) ≤ 0.02 across all islands (severe stagnation), AdaEvolve dynamically spawns new islands to restart exploration from diverse seeds:

```python
def check_spawn_trigger(islands: list[Island]) -> bool:
    """Trigger new island spawning on severe global stagnation."""
    return all(island.accumulated_signal.G <= 0.02 for island in islands)

def spawn_island(population_db: PopulationDB, meta_tactics: list[str]) -> Island:
    """Create new island with diverse seeds and meta-tactic injection."""
    diverse_seeds = population_db.sample_diverse(n=5, method="maximin_distance")
    new_island = Island(
        seeds=diverse_seeds,
        meta_tactics=meta_tactics,  # inject current tactical directives
        exploration_intensity=0.7   # start with high exploration
    )
    return new_island

11.6 EVOLVE-BLOCK Markers

SkyDiscover uses EVOLVE-BLOCK-START / EVOLVE-BLOCK-END markers to designate mutable regions within programs. Code outside these markers is preserved as immutable context. If no markers are present, the entire program becomes mutable:

import numpy as np

# This code is immutable context
def load_data(path: str) -> np.ndarray:
    return np.load(path)

# EVOLVE-BLOCK-START
def solve(data: np.ndarray) -> float:
    """This function will be evolved by the search algorithm."""
    # Initial naive implementation
    return float(np.sum(data))
# EVOLVE-BLOCK-END

# This code is also immutable
if __name__ == "__main__":
    result = solve(load_data("input.npy"))
    print(f"Result: {result}")

11.7 Artifact Injection

Evaluation feedback is automatically incorporated into subsequent generation prompts through the artifacts return value. This creates a lightweight form of diagnostic feedback (similar to GEPA's ASI, but less structured):

def evaluate(program_path: str) -> dict:
    """Evaluator returns score + artifacts for next iteration's context."""
    score = run_tests(program_path)
    return {
        "combined_score": score,
        "artifacts": {
            "failed_test_cases": get_failures(program_path),
            "runtime_ms": measure_runtime(program_path),
            "memory_mb": measure_memory(program_path),
            "hint": "Consider dynamic programming for overlapping subproblems"
        }
    }

12 Programming Language

Aspect Details
Framework language Python
Evolved programs Primarily Python; extensible to any language via custom evaluators
Dependencies Standard scientific Python stack (numpy, scipy) + LLM provider SDKs
Configuration YAML
Extensibility Custom search algorithms via skydiscover/search/ module interface; custom benchmarks via benchmarks/README.md

13 Memory Management

13.1 Population Database

Each island maintains its own population of candidate programs, with a global best tracker across all islands. The population management is algorithm-dependent:

  • AdaEvolve: Per-island archives with UCB-driven allocation; inter-island migration
  • EvoX: Self-evolving population with experience management
  • Native backends (OpenEvolve, GEPA): Delegated to the respective algorithm's own population management

13.2 Checkpoint System

Full state serialization enables resuming interrupted runs. The checkpoint includes:

  • All island states (programs, scores, accumulated improvement signals)
  • Bandit state (cumulative rewards, visit counts)
  • Meta-guidance tactical history
  • Global best program and score
  • Iteration counter and cost tracking

13.3 Artifact Injection as Context

Evaluation artifacts are stored alongside programs and injected into LLM context for subsequent mutations. This creates a lightweight "memory" where the search algorithm can learn from previous evaluation outcomes without a formal learning log system.

14 Continued Learning

14.1 Meta-Guidance as Accumulated Knowledge

AdaEvolve's meta-guidance system functions as a form of continued learning: each time stagnation triggers tactical generation, the tactics represent learned knowledge about what hasn't worked and what might work instead. Tactics persist across iterations and accumulate over the run.

14.2 EvoX Self-Evolving Strategy

The EvoX algorithm goes further: it co-adapts the search strategy itself alongside the solutions being evolved. The strategy parameters (parent selection weights, mutation scope, context assembly rules) are evolved using LLM-driven strategy mutation, creating a meta-level optimization loop.

14.3 No Formal Learning Logs

Unlike Darwinian Evolver's explicit learning log system, SkyDiscover does not maintain a structured cross-population mutation history. Knowledge transfer happens implicitly through:

  • Artifact injection (evaluation feedback into next mutation)
  • Meta-guidance tactics (high-level strategy shifts)
  • Inter-island migration (direct solution transfer)

[!warning] Gap SkyDiscover/AdaEvolve does not implement explicit learning logs (Darwinian Evolver), prompt co-evolution (ShinkaEvolve), or structured ASI diagnostics (GEPA). These represent integration opportunities for an OmniEvolve-type system. However, AdaEvolve's meta-guidance and adaptive resource allocation partially compensate by providing implicit learning through behavioral adaptation.

15 Applications and Benchmarks

15.1 Mathematical Optimization

14 tasks including circle packing, Erdős problems, Heilbronn triangles, geometric optimization. AdaEvolve matches or exceeds human SOTA on circle packing and achieves best open-source results.

15.2 Real-World Systems Optimization

9 tasks from the Algorithmic Discovery for Real Systems (ADRS) benchmark: cloud scheduling, load balancing, Mixture-of-Experts expert placement, GPU kernel optimization. The 41% cost reduction on cloud transfer and 14% improvement on GPU load balancing demonstrate practical impact beyond academic benchmarks.

15.3 Competitive Programming (Frontier-CS)

172 algorithm design problems drawn from competitive programming. The 50-LLM-call budget makes this a stringent test of sample efficiency. AdaEvolve's 21% improvement over OpenEvolve on mean score demonstrates that adaptive resource allocation significantly outperforms static scheduling.

15.4 AtCoder Heuristic Contest (ALE)

10 tasks derived from AtCoder Heuristic Contests, providing realistic optimization problems with complex evaluation landscapes.

15.5 Creative and NLP Tasks

Image generation evolution and HotPotQA prompt optimization demonstrate the framework's generality beyond traditional optimization. These tasks validate that the evaluator API (combined_score + artifacts) is sufficiently flexible for non-numeric optimization targets.

16 Comparison SkyDiscover vs Other Frameworks

16.1 Feature Matrix

Feature SkyDiscover/AdaEvolve OpenEvolve ShinkaEvolve GEPA LLM4AD
Adaptation ✅ Three-level hierarchical Static Bandit LLM selection Pareto-based Method-specific
Island Management UCB allocation + dynamic spawning Fixed + ring migration Dynamic spawning on stagnation N/A (single population) Method-specific
Resource Allocation Globally-normalized UCB bandit Equal across islands Equal + bandit for LLM selection N/A N/A
Stagnation Response Meta-guidance + island spawning None Dynamic island spawning Reflection-driven mutation None
Benchmarks ✅ 200+ ~10 ~20 ~30 ~50+
Multi-Algorithm ✅ Yes (6+ strategies) 1 (AlphaEvolve-style) 1 (custom) 1 (custom) 7 methods
Multi-Provider LLM ✅ Yes ✅ Yes ✅ Yes ✅ Yes ✅ Yes
Live Dashboard ✅ Yes No No No ✅ Yes (GUI)
Checkpoint Resume ✅ Yes ✅ Yes ✅ Yes ⚠️ Partial ✅ Yes
Human Feedback ✅ Yes (dashboard) No No No No
Prompt Evolution No No ✅ Yes (v1.1) No No
Learning Logs No (implicit via artifacts) No No No No
Diagnostic ASI Partial (artifacts) No No ✅ Yes No

16.2 Key Differentiators

What SkyDiscover Does Best

  • Adaptive resource allocation: The only system that dynamically adjusts compute distribution across search fronts based on observed improvement
  • Meta-guidance: Unique ability to generate tactical paradigm shifts on stagnation, preventing long plateau periods
  • Fair benchmarking: 200+ tasks in a unified platform enable rigorous cross-system comparison
  • Minimal configuration: AdaEvolve automates all scheduling decisions that other systems require hand-tuning
  • Systems optimization: Strong results on real-world infrastructure tasks beyond academic benchmarks

What SkyDiscover Lacks

[!tip] Integration Opportunity SkyDiscover's three-level hierarchical adaptation is orthogonal to most innovations in other systems. The accumulated improvement signal could drive ShinkaEvolve's prompt co-evolution (mutate prompts more aggressively when G is low). GEPA's ASI could feed richer information into AdaEvolve's meta-guidance generator. Darwinian Evolver's learning logs could provide historical context for tactical generation. These integrations are a natural fit for the OmniEvolve architecture.


SkyDiscover & AdaEvolve Technical Report — Evolutionary AI Systems Survey 2024–2026 Paper: arXiv:2602.20133 | Repository: github.com/skydiscover-ai/skydiscover | Project: skydiscover-ai.github.io


← Back to Survey Architecture Recommendations GEPA Details OpenEvolve Details