SkyDiscover & AdaEvolve
A Modular Framework for AI-Driven Algorithmic Discovery with Hierarchical Adaptive Search Organization: UC Berkeley Sky Lab Published: February 2026 Type: Framework + Research Paper (arXiv:2602.20133) License: Apache 2.0 Report Type: PhD-Level Technical Analysis Report Date: March 2026
Table of Contents
- Full Title and Attribution
- Authors and Team
- Core Contribution
- Supported Solutions
- LLM Integration
- Key Results
- Reproducibility
- Compute and API Costs
- Architecture Solution
- Component Breakdown
- Core Mechanisms (Detailed)
- Programming Language
- Memory Management
- Continued Learning
- Applications and Benchmarks
- Comparison: SkyDiscover vs Other Frameworks
1 Full Title and Attribution
Framework: SkyDiscover: A Modular Framework for AI-Driven Scientific and Algorithmic Discovery
Paper: AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization (arXiv:2602.20133)
Project Page: skydiscover-ai.github.io
Repository: github.com/skydiscover-ai/skydiscover
License: Apache License 2.0
Organization: UC Berkeley Sky Lab
Publication Date: February 23, 2026
Lineage: Builds on FunSearch, OpenEvolve, GEPA, ShinkaEvolve; extends with hierarchical adaptive search
2 Authors and Team
SkyDiscover and AdaEvolve were developed at the UC Berkeley Sky Lab, a research group known for systems-level AI infrastructure (Ray, Spark, Alpa, vLLM).
Authors: Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, Ion Stoica The team brings together expertise in systems optimization (Stoica, Zaharia — creators of Spark and Ray), program analysis (Sen — creator of KLEE), and machine learning (Dimakis). This systems-first DNA is reflected in the framework's emphasis on modular design, fair benchmarking, and real-world systems optimization tasks.
Notably, several authors overlap with the GEPA team (UC Berkeley / Stanford), and the framework includes GEPA as a first-class backend algorithm. SkyDiscover represents the Berkeley lab's effort to unify the fragmented landscape of LLM-driven evolutionary search into a single, fair evaluation platform.
3 Core Contribution
[!important] Key Novelty SkyDiscover makes two distinct contributions: (1) a modular framework providing a unified interface for implementing, running, and fairly comparing discovery algorithms across 200+ optimization tasks; and (2) AdaEvolve, a novel three-level hierarchical adaptation algorithm that replaces static search schedules with dynamic resource allocation coordinated by an accumulated improvement signal.
What Makes SkyDiscover/AdaEvolve Novel
- Hierarchical adaptive search: AdaEvolve is the first evolutionary search algorithm to implement three-level adaptation: local (within-island exploration intensity), global (cross-island resource allocation via bandit), and meta (tactical paradigm shifts when stagnation detected). All three levels are coordinated by a single unified signal.
- Accumulated improvement signal: A scale-invariant, exponential moving average of squared improvement magnitudes that serves as a real-time volatility metric. This single signal drives decisions at all three adaptation levels — a mathematically elegant unification.
- Globally-normalized bandit rewards: Unlike previous systems that measure island improvement relative to each island's local best, AdaEvolve normalizes rewards against the global best, preventing "poor island bias" where weak islands receive disproportionate resources for trivial improvements.
- Meta-guidance via LLM tactical generation: When stagnation is detected across all islands, the system triggers a meta-level LLM analysis that generates high-level algorithmic directives (e.g., "switch from greedy to dynamic programming"), forcing qualitative search redirection.
- Unified benchmarking platform: SkyDiscover provides 200+ optimization tasks spanning mathematics, systems optimization, competitive programming, and creative applications, enabling fair head-to-head comparison of different search algorithms.
- Multi-algorithm framework: Ships with AdaEvolve, EvoX, and native backends for OpenEvolve, GEPA, and ShinkaEvolve, plus generic strategies (Top-K, Beam Search, Best-of-N).
Relationship to Prior Work
| System | Year | Adaptation | Search Strategy | Benchmark Coverage |
|---|---|---|---|---|
| FunSearch | 2023 | Static | Single population | Math only |
| OpenEvolve | 2025 | Static islands | MAP-Elites + islands | ~10 tasks |
| ShinkaEvolve | 2025 | Bandit LLM selection | Islands + dynamic spawning | ~20 tasks |
| GEPA | 2026 | Pareto-based | Reflection-driven | ~30 tasks |
| SkyDiscover | 2026 | Three-level hierarchical | Multi-algorithm (AdaEvolve, EvoX, + backends) | 200+ tasks |
4 Supported Solutions
4.1 Solution Types
| Solution Type | Description | Supported |
|---|---|---|
| Function-level optimization | Evolve a single function for a given task | ✅ Yes |
| Full program evolution | Evolve complete programs with EVOLVE-BLOCK markers | ✅ Yes |
| Multi-file codebase | Modify multiple files simultaneously | ✅ Yes (agentic mode) |
| Prompt optimization | Evolve NLP prompts for downstream tasks | ✅ Yes (HotPotQA) |
| Image generation | Evolve image generation parameters/code | ✅ Yes (creative tasks) |
| Systems optimization | Cloud scheduling, load balancing, kernel tuning | ✅ Yes (9 tasks) |
| Self-modification | Agent modifying its own code | ❌ No |
4.2 Benchmark Portfolio (200+ Tasks)
| Domain | Tasks | Examples |
|---|---|---|
| Mathematics | 14 | Circle packing, Erdős problems, Heilbronn triangles, geometric optimization |
| Systems | 9 | Cloud scheduling, load balancing, MoE expert placement, GPU kernel optimization |
| Algorithms (Frontier-CS) | 172 | Competitive programming from diverse categories |
| Algorithms (ALE) | 10 | AtCoder Heuristic Contest-derived optimization |
| Creative / NLP | 2+ | Image generation evolution, HotPotQA prompt optimization |
5 LLM Integration
5.1 Multi-Provider Support
SkyDiscover supports all major LLM providers through a unified provider/model format:
| Provider | Models | Format |
|---|---|---|
| OpenAI | GPT-5, GPT-4o, o3-mini | openai/gpt-5 (default provider) |
| Gemini 3 Pro, Gemini 3 Flash | gemini/gemini-3-pro |
|
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.6 | anthropic/claude-opus-4-6 |
| Local (Ollama) | Any LiteLLM-compatible model | ollama/qwen2.5-coder:32b |
5.2 Weighted Multi-Model Pools
A distinctive feature of SkyDiscover is weighted multi-model pools for distributed sampling. Instead of using a single LLM per mutation, the framework samples from a weighted mixture:
# YAML configuration for weighted model pools
llm:
models:
- model: "openai/gpt-5"
weight: 0.4
- model: "gemini/gemini-3-pro"
weight: 0.3
- model: "anthropic/claude-sonnet-4-6"
weight: 0.2
- model: "ollama/qwen2.5-coder:32b"
weight: 0.1
system_prompt: "You are an expert algorithm designer..."
max_iterations: 500
5.3 Agentic Mode
SkyDiscover provides an agentic mode where the LLM has access to the full project file structure during mutation. This enables context-aware mutations that consider imports, dependencies, and cross-file interactions — conceptually similar to Arcgentica's runtime-as-context but at the file system level rather than the REPL level.
5.4 Custom System Prompts
Each task can define custom system prompts that are injected into the LLM context during mutation. Combined with AdaEvolve's meta-guidance system, prompts can be dynamically augmented with tactical directives when the system detects stagnation.
6 Key Results
[!success] Headline Results AdaEvolve achieves ~34% median improvement over OpenEvolve/GEPA/ShinkaEvolve baselines across ~200 benchmarks, matches AlphaEvolve on 6/6 systems tasks and 6/8 math tasks, and demonstrates strong real-world systems impact.
6.1 Mathematical Optimization (6 tasks, 100 iterations)
| Problem | AdaEvolve | OpenEvolve | GEPA | Human SOTA |
|---|---|---|---|---|
| Circle Packing (Square) | 2.636 | 2.590 | 2.610 | 2.634 |
| Heilbronn Triangles | 0.036 | 0.028 | 0.031 | — |
| Signal Processing | 0.718 | 0.619 | 0.682 | — |
AdaEvolve matches or exceeds human SOTA on circle packing (2.636 ≥ 2.634) and achieves best open-source results across all math benchmarks.
6.2 Real-World Systems Optimization (ADRS, 7 tasks)
| Task | AdaEvolve (GPT-5) | AdaEvolve (Gemini-3-Pro) | Best Baseline |
|---|---|---|---|
| Cloud Transfer Cost | 41% lower than baselines | 41% lower than baselines | OpenEvolve |
| GPU Load Balancing | 14% better than baselines | 14% better than baselines | GEPA |
| MoE Expert Placement | Best on all 7 | Best on all 7 | ShinkaEvolve |
Wins on all 7 ADRS benchmarks under both GPT-5 and Gemini-3-Pro. Largest gains on sparse/bursty improvement tasks (TXN: 4348 vs baseline 4329), where adaptive resource allocation excels.
6.3 Frontier-CS (172 algorithm design problems, 50 LLM calls)
| Metric | AdaEvolve | OpenEvolve | GEPA | Single-call GPT-5 |
|---|---|---|---|---|
| Mean Score | 61.33 | 50.75 | 54.20 | 20.64 |
| Median Score | 75.15 | 56.37 | 60.12 | 15.30 |
| Improvement over OpenEvolve | +21% (mean score) | — | — | — |
6.4 Ablation Study
| Configuration | Circle Packing | Signal Processing |
|---|---|---|
| Full AdaEvolve | 2.6294 ± 0.003 | 0.7178 ± 0.019 |
| w/o Local Adaptation | 2.5906 ± 0.048 | 0.6807 ± 0.021 |
| w/o Bandit Selection | 2.6180 ± 0.005 | 0.6190 ± 0.054 |
| w/o Meta-Guidance | 2.5213 ± 0.028 | 0.5476 ± 0.011 |
[!note] Key Finding Meta-Guidance removal causes the largest performance degradation across all benchmarks. This validates that adaptive tactical generation — injecting high-level algorithmic directives when search stagnates — is AdaEvolve's most impactful innovation.
7 Reproducibility
| Criterion | Status | Details |
|---|---|---|
| Source Code | ✅ Available | github.com/skydiscover-ai/skydiscover (Apache 2.0) |
| Paper | ✅ Available | arXiv:2602.20133 (CC BY 4.0) |
| Benchmarks | ✅ 200+ included | All benchmarks ship with the framework; evaluator functions provided |
| Configuration | ✅ YAML-based | Exact configs for all experiments are in the repository |
| Checkpoint Resumption | ✅ Yes | Long-running discovery tasks can be interrupted and resumed |
| Determinism | ⚠️ Partial | LLM outputs are stochastic; framework tracks seeds where possible |
| API Key Requirement | ⚠️ Required | Needs at least one LLM provider API key (or local Ollama) |
8 Compute and API Costs
8.1 Minimal Configuration
AdaEvolve is designed for minimal configuration: it requires only a model name and an iteration budget. The three-level adaptation handles all scheduling decisions automatically, eliminating the need to hand-tune island counts, mutation ratios, or exploration-exploitation schedules.
8.2 Cost Efficiency
| Setting | LLM Calls | Estimated Cost | Benchmark |
|---|---|---|---|
| Frontier-CS (172 problems) | 50 per problem | $0.10–0.50 per problem | Mean score 61.33 |
| Math optimization (100 iter) | 100 | $5–30 | Matches human SOTA |
| Systems optimization (ADRS) | 50–200 | $2–20 | 41% cost reduction |
8.3 Cost Advantage from Adaptive Allocation
AdaEvolve's hierarchical adaptation naturally reduces costs by allocating more resources to productive islands and fewer to stagnant ones. The globally-normalized bandit prevents wasting compute on low-performing search fronts. Compared to static schedule systems:
- ~20–30% fewer wasted LLM calls vs. fixed island allocation (OpenEvolve)
- Dynamic island spawning/pruning prevents maintaining inactive search fronts
- Meta-guidance breaks through stagnation plateaus that would otherwise waste remaining budget
9 Architecture Solution
9.1 Framework Architecture
SkyDiscover has a clean modular architecture separating the framework layer (task definitions, evaluation, configuration, monitoring) from the search algorithm layer (AdaEvolve, EvoX, or any backend):
╔════════════════════════════════════════════════════════════════════════════╗
║ S K Y D I S C O V E R ║
║ Modular Framework for AI-Driven Algorithmic Discovery ║
╚════════════════════════════════════════════════════════════════════════════╝
┌──────────────────────────┐
│ USER INTERFACE │
│ │
│ YAML Config │
│ Live Dashboard │
│ Human Feedback Steering │
│ Checkpoint Resume │
└────────────┬─────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌──────▼──────────┐ ┌──────▼──────────┐
│ EVALUATOR API │ │ SEARCH ALGO │ │ LLM PROVIDER │
│ │ │ ROUTER │ │ LAYER │
│ evaluate(path) │ │ │ │ │
│ → combined_score │ │ AdaEvolve │ │ OpenAI │
│ → artifacts │ │ EvoX │ │ Gemini │
│ EVOLVE-BLOCK │ │ Top-K │ │ Anthropic │
│ markers │ │ Beam Search │ │ Ollama/local │
└─────────┬─────────┘ │ Best-of-N │ │ Weighted pools │
│ │ GEPA Native │ └────────┬────────┘
│ │ OpenEvolve │ │
│ └──────┬──────────┘ │
│ │ │
└──────────────────┼──────────────────────┘
│
┌──────────▼──────────────┐
│ BENCHMARK SUITE │
│ │
│ Math (14 tasks) │
│ Systems (9 tasks) │
│ Frontier-CS (172 tasks) │
│ ALE (10 tasks) │
│ Creative/NLP (2+ tasks) │
└──────────────────────────┘
9.2 AdaEvolve Algorithm Architecture
AdaEvolve implements a multi-island evolutionary search with three-level hierarchical adaptation:
╔══════════════════════════════════════════════════════════════════════╗
║ A D A E V O L V E ║
║ Three-Level Hierarchical Adaptive Search ║
╚══════════════════════════════════════════════════════════════════════╝
LEVEL 3: META-GUIDANCE (triggered on global stagnation)
┌──────────────────────────────────────────────────────────────────┐
│ IF G_t^(k) ≤ 0.12 for all k: │
│ → LLM analyzes evaluator code + current best programs │
│ → Generates high-level tactical directives │
│ → Injects tactics into mutation prompts │
│ → Forces qualitative search paradigm shift │
│ Examples: "Switch from greedy to DP", "Use Voronoi init" │
└──────────────────────────────────┬───────────────────────────────┘
│ tactical injection
LEVEL 2: GLOBAL ADAPTATION (cross-island resource allocation)
┌──────────────────────────────────▼───────────────────────────────┐
│ UCB Island Selection: k* = argmax[R_k/V_k + C√(ln N/n_k)] │
│ │
│ Globally-normalized rewards: r_t^(k) = (f' - f_k*) / f_global* │
│ Decayed tracking: R_t^(k) = ρ·R_{t-1}^(k) + r_t^(k) │
│ │
│ Dynamic island spawning: when G_t^(k) ≤ 0.02 across all k │
└──────────────────────────────────┬───────────────────────────────┘
│ resource allocation
LEVEL 1: LOCAL ADAPTATION (within-island exploration intensity)
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ Island 1 │ │ Island 2 │ │ Island 3 │ │ Island K │
│ │ │ │ │ │ │ │
│ I_t^(1) │ │ I_t^(2) │ │ I_t^(3) │ │ I_t^(K) │
│ G_t^(1) │ │ G_t^(2) │ │ G_t^(3) │ │ G_t^(K) │
│ │ │ │ │ │ │ │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ Accum. │ │ │ │ Accum. │ │ │ │ Accum. │ │ │ │ Accum. │ │
│ │ Improv.│ │ │ │ Improv.│ │ │ │ Improv.│ │ │ │ Improv.│ │
│ │ Signal │ │ │ │ Signal │ │ │ │ Signal │ │ │ │ Signal │ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
│ │ │ │
└────────────────┼────────────────┼────────────────┘
│ inter-island migration
▼
┌──────────────────────┐
│ POPULATION DATABASE │
│ Per-island archives │
│ Global best tracking │
└──────────────────────┘
10 Component Breakdown
10.1 Framework Components (SkyDiscover)
| Component | Module | Responsibility |
|---|---|---|
| Evaluator API | evaluate(program_path) |
Returns {combined_score, artifacts}; artifacts feed into next iteration |
| Search Router | skydiscover/search/ |
Dispatches to AdaEvolve, EvoX, Top-K, Beam, Best-of-N, or native backends |
| Context Builder | skydiscover/context_builder/ |
Assembles LLM prompt from code, artifacts, and search state |
| LLM Provider Layer | YAML config | Weighted multi-model pools, provider routing |
| Benchmark Suite | benchmarks/ |
200+ tasks with evaluators, initial programs, documentation |
| Live Dashboard | Web UI | Real-time scatter plots, code diffs, metric tracking, human intervention |
| Checkpoint System | Built-in | Full state serialization for resuming interrupted runs |
10.2 Algorithm Components (AdaEvolve)
| Component | Role | Key Parameter |
|---|---|---|
| Accumulated Improvement Signal | Per-island volatility metric driving all three adaptation levels | ρ (decay rate, default 0.9) |
| Local Adaptation Engine | Adjusts exploration intensity per island per iteration | I_min=0.1, I_max=0.7 |
| Global UCB Bandit | Allocates compute across islands via UCB selection | C=√2 |
| Meta-Guidance Generator | LLM-driven tactical directive generation on stagnation | Trigger: G ≤ 0.12 |
| Island Manager | Dynamic spawning when all islands stagnate | Trigger: G ≤ 0.02 |
| Migration Controller | Inter-island solution transfer | Configurable topology |
10.3 EvoX Algorithm (Secondary)
SkyDiscover also includes EvoX, a self-evolving optimization strategy where the system co-adapts both solution generation and experience management using LLM-driven strategy evolution during runtime. EvoX provides a complementary approach to AdaEvolve's hierarchical adaptation, focusing on evolving the search strategy itself rather than parameterizing a fixed strategy.
11 Core Mechanisms (Detailed)
11.1 The Accumulated Improvement Signal
The foundation of AdaEvolve is a single, unified signal that measures the volatility of improvement on each island. This signal coordinates all three adaptation levels:
Step 1: Normalized Improvement Magnitude
After each mutation evaluation on island k:
$$ \delta_t^{(k)} = \max\left(\frac{f' - f_k^}{f_k^}, 0\right) $$
(Eq. 1 — Normalized Improvement)
Where f' is the new program's score and f_k* is the best score on island k. Normalization makes the signal scale-invariant across different problem types.
Step 2: Exponential Moving Average of Squared Improvements
$$ G_t^{(k)} = \rho \cdot G_{t-1}^{(k)} + (1 - \rho) \cdot (\delta_t^{(k)})^2 $$
(Eq. 2 — Accumulated Improvement Signal)
With ρ = 0.9 (decay rate). High G_t^(k) indicates a productive trajectory (recent large improvements). Low G_t^(k) signals stagnation requiring intervention. The squaring amplifies large breakthroughs and suppresses noise from trivial changes.
class AccumulatedImprovementSignal:
"""Per-island volatility metric that drives all three adaptation levels."""
def __init__(self, rho: float = 0.9, epsilon: float = 1e-8):
self.rho = rho
self.epsilon = epsilon
self.G = 0.0 # accumulated signal
self.best_score = float('-inf')
def update(self, new_score: float) -> float:
"""Update signal after evaluating a new program on this island."""
if self.best_score > float('-inf'):
delta = max((new_score - self.best_score) / (abs(self.best_score) + self.epsilon), 0.0)
else:
delta = 0.0
self.G = self.rho * self.G + (1 - self.rho) * delta ** 2
if new_score > self.best_score:
self.best_score = new_score
return self.G
11.2 Level 1: Local Adaptation (Exploration Intensity)
Each island dynamically adjusts its exploration intensity based on its accumulated improvement signal:
$$ I_t^{(k)} = I_{\min} + \frac{I_{\max} - I_{\min}}{1 + \sqrt{G_t^{(k)} + \epsilon}} $$
(Eq. 3 — Dynamic Exploration Intensity)
Where I_min=0.1, I_max=0.7. When G is high (productive trajectory), intensity is low → exploitation-dominant sampling from top-ranked parents. When G is low (stagnation), intensity rises → exploration-dominant sampling, more random parent selection, and broader mutation scope.
[!info] Interpretation Exploration intensity controls the balance between refining known good solutions (low I = focused diff patches from best parents) and trying creative leaps (high I = full rewrites from diverse parents). This is analogous to simulated annealing's temperature, but driven by observed improvement rather than a fixed cooling schedule.
def compute_exploration_intensity(G: float, I_min: float = 0.1, I_max: float = 0.7,
epsilon: float = 1e-8) -> float:
"""Compute exploration intensity from accumulated improvement signal."""
return I_min + (I_max - I_min) / (1 + (G + epsilon) ** 0.5)
11.3 Level 2: Global Adaptation (Cross-Island Resource Allocation)
Globally-Normalized Bandit Rewards
AdaEvolve's key insight: measuring improvement relative to each island's local best creates "poor island bias." Instead, rewards are normalized against the global best:
$$ r_t^{(k)} = \frac{f' - f_k^}{f_{\text{global}}^} $$
(Eq. 4 — Globally-Normalized Reward)
This prevents weak islands from receiving outsized credit for trivial refinements. An island improving from score 10 to 12 gets much less reward than one improving from 90 to 92 when f_global* = 95.
Decayed Cumulative Tracking
$$ R_t^{(k)} = \rho \cdot R_{t-1}^{(k)} + r_t^{(k)} $$ $$ V_t^{(k)} = \rho \cdot V_{t-1}^{(k)} + 1 $$
(Eq. 5 — Decayed Cumulative Tracking)
Exponential decay ensures stale early breakthroughs don't dominate future allocation. The visit count V_t^(k) is also decayed, ensuring UCB exploration bonuses reflect recent neglect, not total neglect.
UCB Island Selection
$$ k^* = \arg\max_k \left[\frac{R_k}{V_k} + C\sqrt{\frac{\ln N}{n_k}}\right] $$
(Eq. 6 — UCB Island Selection)
With C = √2. The first term (R_k/V_k) is exploitation (favor recently productive islands), the second is exploration (favor recently neglected islands).
class GlobalAdaptationBandit:
"""UCB bandit for cross-island resource allocation with global normalization."""
def __init__(self, n_islands: int, rho: float = 0.9, C: float = 1.414):
self.n_islands = n_islands
self.rho = rho
self.C = C
self.R = np.zeros(n_islands) # decayed cumulative reward
self.V = np.zeros(n_islands) # decayed visit count
self.total_visits = 0
self.global_best = float('-inf')
def select_island(self) -> int:
"""Select next island to allocate compute to via UCB."""
self.total_visits += 1
ucb_values = np.zeros(self.n_islands)
for k in range(self.n_islands):
if self.V[k] < 1e-8:
ucb_values[k] = float('inf') # force exploration of unvisited
else:
exploit = self.R[k] / self.V[k]
explore = self.C * np.sqrt(np.log(self.total_visits) / self.V[k])
ucb_values[k] = exploit + explore
return int(np.argmax(ucb_values))
def update(self, island_k: int, new_score: float, island_best: float):
"""Update bandit state with globally-normalized reward."""
if new_score > self.global_best:
self.global_best = new_score
reward = (new_score - island_best) / (abs(self.global_best) + 1e-8)
self.R[island_k] = self.rho * self.R[island_k] + reward
self.V[island_k] = self.rho * self.V[island_k] + 1
11.4 Level 3: Meta-Guidance (Tactical Generation)
When the accumulated improvement signal drops below a threshold across all islands simultaneously, AdaEvolve triggers meta-level LLM analysis:
[!important] Meta-Guidance Trigger Trigger condition: G_t^(k) ≤ 0.12 for all k Action: Invoke meta-LLM to analyze the evaluator code, identify bottlenecks in the current best program, and propose fundamentally different algorithmic approaches — not incremental improvements, but paradigm shifts.
META_GUIDANCE_PROMPT = """
You are analyzing a stalled optimization process. The evolutionary search has failed
to make progress across all islands for the last several iterations.
## Evaluator Code
```python
{evaluator_code}
Current Best Program (score: {best_score})
{best_program}
Recent Failed Approaches (last 10 mutations)
{failed_approaches}
Task
Propose 2-3 fundamentally different algorithmic approaches. Do NOT suggest incremental improvements. Instead, suggest: 1. A completely different algorithm class (e.g., switching from greedy to DP, from brute force to divide-and-conquer) 2. Concrete techniques with specific library functions (e.g., scipy.optimize.linear_sum_assignment) 3. A novel data structure that changes the problem's complexity class
Output each tactic as: TACTIC: [name] — [concrete description with code hints] """
**Example tactics generated in practice:**
- "Trust-region root finding for faster convergence on constrained optimization"
- "Voronoi-based initialization for better spatial coverage in packing problems"
- "Median filtering + linear sum assignment via scipy for robust matching"
- "Replace recursive DFS with iterative BFS + priority queue for better cache locality"
These tactics are **injected into subsequent mutation prompts** on all islands, forcing the entire search to shift direction. The tactic injection persists for a configurable window (default: 20 iterations) before the system re-evaluates stagnation.
### 11.5 Dynamic Island Spawning
A more extreme intervention than meta-guidance: when G_t^(k) ≤ 0.02 across all islands (severe stagnation), AdaEvolve dynamically spawns new islands to restart exploration from diverse seeds:
```python
def check_spawn_trigger(islands: list[Island]) -> bool:
"""Trigger new island spawning on severe global stagnation."""
return all(island.accumulated_signal.G <= 0.02 for island in islands)
def spawn_island(population_db: PopulationDB, meta_tactics: list[str]) -> Island:
"""Create new island with diverse seeds and meta-tactic injection."""
diverse_seeds = population_db.sample_diverse(n=5, method="maximin_distance")
new_island = Island(
seeds=diverse_seeds,
meta_tactics=meta_tactics, # inject current tactical directives
exploration_intensity=0.7 # start with high exploration
)
return new_island
11.6 EVOLVE-BLOCK Markers
SkyDiscover uses EVOLVE-BLOCK-START / EVOLVE-BLOCK-END markers to designate mutable regions within programs. Code outside these markers is preserved as immutable context. If no markers are present, the entire program becomes mutable:
import numpy as np
# This code is immutable context
def load_data(path: str) -> np.ndarray:
return np.load(path)
# EVOLVE-BLOCK-START
def solve(data: np.ndarray) -> float:
"""This function will be evolved by the search algorithm."""
# Initial naive implementation
return float(np.sum(data))
# EVOLVE-BLOCK-END
# This code is also immutable
if __name__ == "__main__":
result = solve(load_data("input.npy"))
print(f"Result: {result}")
11.7 Artifact Injection
Evaluation feedback is automatically incorporated into subsequent generation prompts through the artifacts return value. This creates a lightweight form of diagnostic feedback (similar to GEPA's ASI, but less structured):
def evaluate(program_path: str) -> dict:
"""Evaluator returns score + artifacts for next iteration's context."""
score = run_tests(program_path)
return {
"combined_score": score,
"artifacts": {
"failed_test_cases": get_failures(program_path),
"runtime_ms": measure_runtime(program_path),
"memory_mb": measure_memory(program_path),
"hint": "Consider dynamic programming for overlapping subproblems"
}
}
12 Programming Language
| Aspect | Details |
|---|---|
| Framework language | Python |
| Evolved programs | Primarily Python; extensible to any language via custom evaluators |
| Dependencies | Standard scientific Python stack (numpy, scipy) + LLM provider SDKs |
| Configuration | YAML |
| Extensibility | Custom search algorithms via skydiscover/search/ module interface; custom benchmarks via benchmarks/README.md |
13 Memory Management
13.1 Population Database
Each island maintains its own population of candidate programs, with a global best tracker across all islands. The population management is algorithm-dependent:
- AdaEvolve: Per-island archives with UCB-driven allocation; inter-island migration
- EvoX: Self-evolving population with experience management
- Native backends (OpenEvolve, GEPA): Delegated to the respective algorithm's own population management
13.2 Checkpoint System
Full state serialization enables resuming interrupted runs. The checkpoint includes:
- All island states (programs, scores, accumulated improvement signals)
- Bandit state (cumulative rewards, visit counts)
- Meta-guidance tactical history
- Global best program and score
- Iteration counter and cost tracking
13.3 Artifact Injection as Context
Evaluation artifacts are stored alongside programs and injected into LLM context for subsequent mutations. This creates a lightweight "memory" where the search algorithm can learn from previous evaluation outcomes without a formal learning log system.
14 Continued Learning
14.1 Meta-Guidance as Accumulated Knowledge
AdaEvolve's meta-guidance system functions as a form of continued learning: each time stagnation triggers tactical generation, the tactics represent learned knowledge about what hasn't worked and what might work instead. Tactics persist across iterations and accumulate over the run.
14.2 EvoX Self-Evolving Strategy
The EvoX algorithm goes further: it co-adapts the search strategy itself alongside the solutions being evolved. The strategy parameters (parent selection weights, mutation scope, context assembly rules) are evolved using LLM-driven strategy mutation, creating a meta-level optimization loop.
14.3 No Formal Learning Logs
Unlike Darwinian Evolver's explicit learning log system, SkyDiscover does not maintain a structured cross-population mutation history. Knowledge transfer happens implicitly through:
- Artifact injection (evaluation feedback into next mutation)
- Meta-guidance tactics (high-level strategy shifts)
- Inter-island migration (direct solution transfer)
[!warning] Gap SkyDiscover/AdaEvolve does not implement explicit learning logs (Darwinian Evolver), prompt co-evolution (ShinkaEvolve), or structured ASI diagnostics (GEPA). These represent integration opportunities for an OmniEvolve-type system. However, AdaEvolve's meta-guidance and adaptive resource allocation partially compensate by providing implicit learning through behavioral adaptation.
15 Applications and Benchmarks
15.1 Mathematical Optimization
14 tasks including circle packing, Erdős problems, Heilbronn triangles, geometric optimization. AdaEvolve matches or exceeds human SOTA on circle packing and achieves best open-source results.
15.2 Real-World Systems Optimization
9 tasks from the Algorithmic Discovery for Real Systems (ADRS) benchmark: cloud scheduling, load balancing, Mixture-of-Experts expert placement, GPU kernel optimization. The 41% cost reduction on cloud transfer and 14% improvement on GPU load balancing demonstrate practical impact beyond academic benchmarks.
15.3 Competitive Programming (Frontier-CS)
172 algorithm design problems drawn from competitive programming. The 50-LLM-call budget makes this a stringent test of sample efficiency. AdaEvolve's 21% improvement over OpenEvolve on mean score demonstrates that adaptive resource allocation significantly outperforms static scheduling.
15.4 AtCoder Heuristic Contest (ALE)
10 tasks derived from AtCoder Heuristic Contests, providing realistic optimization problems with complex evaluation landscapes.
15.5 Creative and NLP Tasks
Image generation evolution and HotPotQA prompt optimization demonstrate the framework's generality beyond traditional optimization. These tasks validate that the evaluator API (combined_score + artifacts) is sufficiently flexible for non-numeric optimization targets.
16 Comparison SkyDiscover vs Other Frameworks
16.1 Feature Matrix
| Feature | SkyDiscover/AdaEvolve | OpenEvolve | ShinkaEvolve | GEPA | LLM4AD |
|---|---|---|---|---|---|
| Adaptation | ✅ Three-level hierarchical | Static | Bandit LLM selection | Pareto-based | Method-specific |
| Island Management | UCB allocation + dynamic spawning | Fixed + ring migration | Dynamic spawning on stagnation | N/A (single population) | Method-specific |
| Resource Allocation | Globally-normalized UCB bandit | Equal across islands | Equal + bandit for LLM selection | N/A | N/A |
| Stagnation Response | Meta-guidance + island spawning | None | Dynamic island spawning | Reflection-driven mutation | None |
| Benchmarks | ✅ 200+ | ~10 | ~20 | ~30 | ~50+ |
| Multi-Algorithm | ✅ Yes (6+ strategies) | 1 (AlphaEvolve-style) | 1 (custom) | 1 (custom) | 7 methods |
| Multi-Provider LLM | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Live Dashboard | ✅ Yes | No | No | No | ✅ Yes (GUI) |
| Checkpoint Resume | ✅ Yes | ✅ Yes | ✅ Yes | ⚠️ Partial | ✅ Yes |
| Human Feedback | ✅ Yes (dashboard) | No | No | No | No |
| Prompt Evolution | No | No | ✅ Yes (v1.1) | No | No |
| Learning Logs | No (implicit via artifacts) | No | No | No | No |
| Diagnostic ASI | Partial (artifacts) | No | No | ✅ Yes | No |
16.2 Key Differentiators
What SkyDiscover Does Best
- Adaptive resource allocation: The only system that dynamically adjusts compute distribution across search fronts based on observed improvement
- Meta-guidance: Unique ability to generate tactical paradigm shifts on stagnation, preventing long plateau periods
- Fair benchmarking: 200+ tasks in a unified platform enable rigorous cross-system comparison
- Minimal configuration: AdaEvolve automates all scheduling decisions that other systems require hand-tuning
- Systems optimization: Strong results on real-world infrastructure tasks beyond academic benchmarks
What SkyDiscover Lacks
- No prompt co-evolution (ShinkaEvolve has this)
- No formal learning logs (Darwinian Evolver has this)
- No structured ASI diagnostics (GEPA has this — artifacts are informal)
- No MAP-Elites quality-diversity (AlphaEvolve/OpenEvolve have this)
- No self-modification (DGM has this)
- No 2-tier novelty filtering (ShinkaEvolve has this)
- No tree search integration (AB-MCTS has this)
[!tip] Integration Opportunity SkyDiscover's three-level hierarchical adaptation is orthogonal to most innovations in other systems. The accumulated improvement signal could drive ShinkaEvolve's prompt co-evolution (mutate prompts more aggressively when G is low). GEPA's ASI could feed richer information into AdaEvolve's meta-guidance generator. Darwinian Evolver's learning logs could provide historical context for tactical generation. These integrations are a natural fit for the OmniEvolve architecture.
SkyDiscover & AdaEvolve Technical Report — Evolutionary AI Systems Survey 2024–2026 Paper: arXiv:2602.20133 | Repository: github.com/skydiscover-ai/skydiscover | Project: skydiscover-ai.github.io
← Back to Survey Architecture Recommendations GEPA Details OpenEvolve Details