SkyDiscover & AdaEvolve

A Modular Framework for AI-Driven Algorithmic Discovery with Hierarchical Adaptive Search Organization: UC Berkeley Sky Lab Published: February 2026 Type: Framework + Research Paper (arXiv:2602.20133) License: Apache 2.0 Report Type: PhD-Level Technical Analysis Report Date: March 2026

Full Title and Attribution
Authors and Team
Core Contribution
Supported Solutions
LLM Integration
Key Results
Reproducibility
Compute and API Costs
Architecture Solution
Component Breakdown
Core Mechanisms (Detailed)
Programming Language
Memory Management
Continued Learning
Applications and Benchmarks
Comparison: SkyDiscover vs Other Frameworks

1 Full Title and Attribution

Framework: SkyDiscover: A Modular Framework for AI-Driven Scientific and Algorithmic Discovery

Paper: AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization (arXiv:2602.20133)

Project Page: skydiscover-ai.github.io

Repository: github.com/skydiscover-ai/skydiscover

License: Apache License 2.0

Organization: UC Berkeley Sky Lab

Publication Date: February 23, 2026

Lineage: Builds on FunSearch, OpenEvolve, GEPA, ShinkaEvolve; extends with hierarchical adaptive search

2 Authors and Team

SkyDiscover and AdaEvolve were developed at the UC Berkeley Sky Lab, a research group known for systems-level AI infrastructure (Ray, Spark, Alpa, vLLM).

Authors: Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lutfi Eren Erdogan, Koushik Sen, Matei Zaharia, Alex Dimakis, Ion Stoica The team brings together expertise in systems optimization (Stoica, Zaharia — creators of Spark and Ray), program analysis (Sen — creator of KLEE), and machine learning (Dimakis). This systems-first DNA is reflected in the framework's emphasis on modular design, fair benchmarking, and real-world systems optimization tasks.

Notably, several authors overlap with the GEPA team (UC Berkeley / Stanford), and the framework includes GEPA as a first-class backend algorithm. SkyDiscover represents the Berkeley lab's effort to unify the fragmented landscape of LLM-driven evolutionary search into a single, fair evaluation platform.

3 Core Contribution

[!important] Key Novelty SkyDiscover makes two distinct contributions: (1) a modular framework providing a unified interface for implementing, running, and fairly comparing discovery algorithms across 200+ optimization tasks; and (2) AdaEvolve, a novel three-level hierarchical adaptation algorithm that replaces static search schedules with dynamic resource allocation coordinated by an accumulated improvement signal.

What Makes SkyDiscover/AdaEvolve Novel

Hierarchical adaptive search: AdaEvolve is the first evolutionary search algorithm to implement three-level adaptation: local (within-island exploration intensity), global (cross-island resource allocation via bandit), and meta (tactical paradigm shifts when stagnation detected). All three levels are coordinated by a single unified signal.
Accumulated improvement signal: A scale-invariant, exponential moving average of squared improvement magnitudes that serves as a real-time volatility metric. This single signal drives decisions at all three adaptation levels — a mathematically elegant unification.
Globally-normalized bandit rewards: Unlike previous systems that measure island improvement relative to each island's local best, AdaEvolve normalizes rewards against the global best, preventing "poor island bias" where weak islands receive disproportionate resources for trivial improvements.
Meta-guidance via LLM tactical generation: When stagnation is detected across all islands, the system triggers a meta-level LLM analysis that generates high-level algorithmic directives (e.g., "switch from greedy to dynamic programming"), forcing qualitative search redirection.
Unified benchmarking platform: SkyDiscover provides 200+ optimization tasks spanning mathematics, systems optimization, competitive programming, and creative applications, enabling fair head-to-head comparison of different search algorithms.
Multi-algorithm framework: Ships with AdaEvolve, EvoX, and native backends for OpenEvolve, GEPA, and ShinkaEvolve, plus generic strategies (Top-K, Beam Search, Best-of-N).

Relationship to Prior Work

System	Year	Adaptation	Search Strategy	Benchmark Coverage
FunSearch	2023	Static	Single population	Math only
OpenEvolve	2025	Static islands	MAP-Elites + islands	~10 tasks
ShinkaEvolve	2025	Bandit LLM selection	Islands + dynamic spawning	~20 tasks
GEPA	2026	Pareto-based	Reflection-driven	~30 tasks
SkyDiscover	2026	Three-level hierarchical	Multi-algorithm (AdaEvolve, EvoX, + backends)	200+ tasks

4 Supported Solutions

4.1 Solution Types

Solution Type	Description	Supported
Function-level optimization	Evolve a single function for a given task	✅ Yes
Full program evolution	Evolve complete programs with EVOLVE-BLOCK markers	✅ Yes
Multi-file codebase	Modify multiple files simultaneously	✅ Yes (agentic mode)
Prompt optimization	Evolve NLP prompts for downstream tasks	✅ Yes (HotPotQA)
Image generation	Evolve image generation parameters/code	✅ Yes (creative tasks)
Systems optimization	Cloud scheduling, load balancing, kernel tuning	✅ Yes (9 tasks)
Self-modification	Agent modifying its own code	❌ No

4.2 Benchmark Portfolio (200+ Tasks)

Domain	Tasks	Examples
Mathematics	14	Circle packing, Erdős problems, Heilbronn triangles, geometric optimization
Systems	9	Cloud scheduling, load balancing, MoE expert placement, GPU kernel optimization
Algorithms (Frontier-CS)	172	Competitive programming from diverse categories
Algorithms (ALE)	10	AtCoder Heuristic Contest-derived optimization
Creative / NLP	2+	Image generation evolution, HotPotQA prompt optimization

5 LLM Integration

5.1 Multi-Provider Support

SkyDiscover supports all major LLM providers through a unified provider/model format:

Provider	Models	Format
OpenAI	GPT-5, GPT-4o, o3-mini	`openai/gpt-5` (default provider)
Google	Gemini 3 Pro, Gemini 3 Flash	`gemini/gemini-3-pro`
Anthropic	Claude Opus 4.6, Claude Sonnet 4.6	`anthropic/claude-opus-4-6`
Local (Ollama)	Any LiteLLM-compatible model	`ollama/qwen2.5-coder:32b`

5.2 Weighted Multi-Model Pools

A distinctive feature of SkyDiscover is weighted multi-model pools for distributed sampling. Instead of using a single LLM per mutation, the framework samples from a weighted mixture:

# YAML configuration for weighted model pools
llm:
  models:
    - model: "openai/gpt-5"
      weight: 0.4
    - model: "gemini/gemini-3-pro"
      weight: 0.3
    - model: "anthropic/claude-sonnet-4-6"
      weight: 0.2
    - model: "ollama/qwen2.5-coder:32b"
      weight: 0.1
  system_prompt: "You are an expert algorithm designer..."
  max_iterations: 500

5.3 Agentic Mode

SkyDiscover provides an agentic mode where the LLM has access to the full project file structure during mutation. This enables context-aware mutations that consider imports, dependencies, and cross-file interactions — conceptually similar to Arcgentica's runtime-as-context but at the file system level rather than the REPL level.

5.4 Custom System Prompts

Each task can define custom system prompts that are injected into the LLM context during mutation. Combined with AdaEvolve's meta-guidance system, prompts can be dynamically augmented with tactical directives when the system detects stagnation.

6 Key Results

[!success] Headline Results AdaEvolve achieves ~34% median improvement over OpenEvolve/GEPA/ShinkaEvolve baselines across ~200 benchmarks, matches AlphaEvolve on 6/6 systems tasks and 6/8 math tasks, and demonstrates strong real-world systems impact.

6.1 Mathematical Optimization (6 tasks, 100 iterations)

Problem	AdaEvolve	OpenEvolve	GEPA	Human SOTA
Circle Packing (Square)	2.636	2.590	2.610	2.634
Heilbronn Triangles	0.036	0.028	0.031	—
Signal Processing	0.718	0.619	0.682	—

AdaEvolve matches or exceeds human SOTA on circle packing (2.636 ≥ 2.634) and achieves best open-source results across all math benchmarks.

6.2 Real-World Systems Optimization (ADRS, 7 tasks)

Task	AdaEvolve (GPT-5)	AdaEvolve (Gemini-3-Pro)	Best Baseline
Cloud Transfer Cost	41% lower than baselines	41% lower than baselines	OpenEvolve
GPU Load Balancing	14% better than baselines	14% better than baselines	GEPA
MoE Expert Placement	Best on all 7	Best on all 7	ShinkaEvolve

Wins on all 7 ADRS benchmarks under both GPT-5 and Gemini-3-Pro. Largest gains on sparse/bursty improvement tasks (TXN: 4348 vs baseline 4329), where adaptive resource allocation excels.

6.3 Frontier-CS (172 algorithm design problems, 50 LLM calls)

Metric	AdaEvolve	OpenEvolve	GEPA	Single-call GPT-5
Mean Score	61.33	50.75	54.20	20.64
Median Score	75.15	56.37	60.12	15.30
Improvement over OpenEvolve	+21% (mean score)	—	—	—

6.4 Ablation Study

Configuration	Circle Packing	Signal Processing
Full AdaEvolve	2.6294 ± 0.003	0.7178 ± 0.019
w/o Local Adaptation	2.5906 ± 0.048	0.6807 ± 0.021
w/o Bandit Selection	2.6180 ± 0.005	0.6190 ± 0.054
w/o Meta-Guidance	2.5213 ± 0.028	0.5476 ± 0.011

[!note] Key Finding Meta-Guidance removal causes the largest performance degradation across all benchmarks. This validates that adaptive tactical generation — injecting high-level algorithmic directives when search stagnates — is AdaEvolve's most impactful innovation.

7 Reproducibility

Criterion	Status	Details
Source Code	✅ Available	github.com/skydiscover-ai/skydiscover (Apache 2.0)
Paper	✅ Available	arXiv:2602.20133 (CC BY 4.0)
Benchmarks	✅ 200+ included	All benchmarks ship with the framework; evaluator functions provided
Configuration	✅ YAML-based	Exact configs for all experiments are in the repository
Checkpoint Resumption	✅ Yes	Long-running discovery tasks can be interrupted and resumed
Determinism	⚠️ Partial	LLM outputs are stochastic; framework tracks seeds where possible
API Key Requirement	⚠️ Required	Needs at least one LLM provider API key (or local Ollama)

8 Compute and API Costs

8.1 Minimal Configuration

AdaEvolve is designed for minimal configuration: it requires only a model name and an iteration budget. The three-level adaptation handles all scheduling decisions automatically, eliminating the need to hand-tune island counts, mutation ratios, or exploration-exploitation schedules.

8.2 Cost Efficiency

Setting	LLM Calls	Estimated Cost	Benchmark
Frontier-CS (172 problems)	50 per problem	$0.10–0.50 per problem	Mean score 61.33
Math optimization (100 iter)	100	$5–30	Matches human SOTA
Systems optimization (ADRS)	50–200	$2–20	41% cost reduction

8.3 Cost Advantage from Adaptive Allocation

AdaEvolve's hierarchical adaptation naturally reduces costs by allocating more resources to productive islands and fewer to stagnant ones. The globally-normalized bandit prevents wasting compute on low-performing search fronts. Compared to static schedule systems:

~20–30% fewer wasted LLM calls vs. fixed island allocation (OpenEvolve)
Dynamic island spawning/pruning prevents maintaining inactive search fronts
Meta-guidance breaks through stagnation plateaus that would otherwise waste remaining budget

9 Architecture Solution

9.1 Framework Architecture

SkyDiscover has a clean modular architecture separating the framework layer (task definitions, evaluation, configuration, monitoring) from the search algorithm layer (AdaEvolve, EvoX, or any backend):

╔════════════════════════════════════════════════════════════════════════════╗
║                            S K Y D I S C O V E R                          ║
║            Modular Framework for AI-Driven Algorithmic Discovery           ║
╚════════════════════════════════════════════════════════════════════════════╝

                          ┌──────────────────────────┐
                          │     USER INTERFACE        │
                          │                          │
                          │  YAML Config             │
                          │  Live Dashboard          │
                          │  Human Feedback Steering │
                          │  Checkpoint Resume       │
                          └────────────┬─────────────┘
                                       │
                  ┌────────────────────┼────────────────────┐
                  │                    │                    │
        ┌─────────▼─────────┐ ┌──────▼──────────┐ ┌──────▼──────────┐
        │  EVALUATOR API    │ │  SEARCH ALGO    │ │  LLM PROVIDER   │
        │                   │ │  ROUTER         │ │  LAYER          │
        │  evaluate(path)   │ │                 │ │                 │
        │  → combined_score │ │  AdaEvolve      │ │  OpenAI         │
        │  → artifacts      │ │  EvoX           │ │  Gemini         │
        │  EVOLVE-BLOCK     │ │  Top-K          │ │  Anthropic      │
        │  markers          │ │  Beam Search    │ │  Ollama/local   │
        └─────────┬─────────┘ │  Best-of-N      │ │  Weighted pools │
                  │           │  GEPA Native    │ └────────┬────────┘
                  │           │  OpenEvolve     │          │
                  │           └──────┬──────────┘          │
                  │                  │                      │
                  └──────────────────┼──────────────────────┘
                                     │
                          ┌──────────▼──────────────┐
                          │    BENCHMARK SUITE       │
                          │                          │
                          │  Math (14 tasks)         │
                          │  Systems (9 tasks)       │
                          │  Frontier-CS (172 tasks) │
                          │  ALE (10 tasks)          │
                          │  Creative/NLP (2+ tasks) │
                          └──────────────────────────┘

9.2 AdaEvolve Algorithm Architecture

AdaEvolve implements a multi-island evolutionary search with three-level hierarchical adaptation:

╔══════════════════════════════════════════════════════════════════════╗
║                         A D A E V O L V E                           ║
║            Three-Level Hierarchical Adaptive Search                  ║
╚══════════════════════════════════════════════════════════════════════╝

  LEVEL 3: META-GUIDANCE (triggered on global stagnation)
  ┌──────────────────────────────────────────────────────────────────┐
  │  IF G_t^(k) ≤ 0.12 for all k:                                   │
  │    → LLM analyzes evaluator code + current best programs         │
  │    → Generates high-level tactical directives                    │
  │    → Injects tactics into mutation prompts                       │
  │    → Forces qualitative search paradigm shift                    │
  │  Examples: "Switch from greedy to DP", "Use Voronoi init"       │
  └──────────────────────────────────┬───────────────────────────────┘
                                     │ tactical injection
  LEVEL 2: GLOBAL ADAPTATION (cross-island resource allocation)
  ┌──────────────────────────────────▼───────────────────────────────┐
  │  UCB Island Selection: k* = argmax[R_k/V_k + C√(ln N/n_k)]     │
  │                                                                   │
  │  Globally-normalized rewards: r_t^(k) = (f' - f_k*) / f_global* │
  │  Decayed tracking: R_t^(k) = ρ·R_{t-1}^(k) + r_t^(k)          │
  │                                                                   │
  │  Dynamic island spawning: when G_t^(k) ≤ 0.02 across all k     │
  └──────────────────────────────────┬───────────────────────────────┘
                                     │ resource allocation
  LEVEL 1: LOCAL ADAPTATION (within-island exploration intensity)
  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
  │  Island 1  │  │  Island 2  │  │  Island 3  │  │  Island K  │
  │            │  │            │  │            │  │            │
  │  I_t^(1)   │  │  I_t^(2)   │  │  I_t^(3)   │  │  I_t^(K)   │
  │  G_t^(1)   │  │  G_t^(2)   │  │  G_t^(3)   │  │  G_t^(K)   │
  │            │  │            │  │            │  │            │
  │ ┌────────┐ │  │ ┌────────┐ │  │ ┌────────┐ │  │ ┌────────┐ │
  │ │ Accum. │ │  │ │ Accum. │ │  │ │ Accum. │ │  │ │ Accum. │ │
  │ │ Improv.│ │  │ │ Improv.│ │  │ │ Improv.│ │  │ │ Improv.│ │
  │ │ Signal │ │  │ │ Signal │ │  │ │ Signal │ │  │ │ Signal │ │
  │ └────────┘ │  │ └────────┘ │  │ └────────┘ │  │ └────────┘ │
  └────────────┘  └────────────┘  └────────────┘  └────────────┘
       │                │                │                │
       └────────────────┼────────────────┼────────────────┘
                        │    inter-island migration
                        ▼
              ┌──────────────────────┐
              │  POPULATION DATABASE  │
              │  Per-island archives  │
              │  Global best tracking │
              └──────────────────────┘

10 Component Breakdown

10.1 Framework Components (SkyDiscover)

Component	Module	Responsibility
Evaluator API	`evaluate(program_path)`	Returns `{combined_score, artifacts}`; artifacts feed into next iteration
Search Router	`skydiscover/search/`	Dispatches to AdaEvolve, EvoX, Top-K, Beam, Best-of-N, or native backends
Context Builder	`skydiscover/context_builder/`	Assembles LLM prompt from code, artifacts, and search state
LLM Provider Layer	YAML config	Weighted multi-model pools, provider routing
Benchmark Suite	`benchmarks/`	200+ tasks with evaluators, initial programs, documentation
Live Dashboard	Web UI	Real-time scatter plots, code diffs, metric tracking, human intervention
Checkpoint System	Built-in	Full state serialization for resuming interrupted runs

10.2 Algorithm Components (AdaEvolve)

Component	Role	Key Parameter
Accumulated Improvement Signal	Per-island volatility metric driving all three adaptation levels	ρ (decay rate, default 0.9)
Local Adaptation Engine	Adjusts exploration intensity per island per iteration	I_min=0.1, I_max=0.7
Global UCB Bandit	Allocates compute across islands via UCB selection	C=√2
Meta-Guidance Generator	LLM-driven tactical directive generation on stagnation	Trigger: G ≤ 0.12
Island Manager	Dynamic spawning when all islands stagnate	Trigger: G ≤ 0.02
Migration Controller	Inter-island solution transfer	Configurable topology

10.3 EvoX Algorithm (Secondary)

SkyDiscover also includes EvoX, a self-evolving optimization strategy where the system co-adapts both solution generation and experience management using LLM-driven strategy evolution during runtime. EvoX provides a complementary approach to AdaEvolve's hierarchical adaptation, focusing on evolving the search strategy itself rather than parameterizing a fixed strategy.

11 Core Mechanisms (Detailed)

11.1 The Accumulated Improvement Signal

The foundation of AdaEvolve is a single, unified signal that measures the volatility of improvement on each island. This signal coordinates all three adaptation levels:

Step 1: Normalized Improvement Magnitude

After each mutation evaluation on island k:

$$ \delta_t^{(k)} = \max\left(\frac{f' - f_k^}{f_k^}, 0\right) $$

(Eq. 1 — Normalized Improvement)

Where f' is the new program's score and f_k* is the best score on island k. Normalization makes the signal scale-invariant across different problem types.

Step 2: Exponential Moving Average of Squared Improvements

$$ G_t^{(k)} = \rho \cdot G_{t-1}^{(k)} + (1 - \rho) \cdot (\delta_t^{(k)})^2 $$

(Eq. 2 — Accumulated Improvement Signal)

With ρ = 0.9 (decay rate). High G_t^(k) indicates a productive trajectory (recent large improvements). Low G_t^(k) signals stagnation requiring intervention. The squaring amplifies large breakthroughs and suppresses noise from trivial changes.

class AccumulatedImprovementSignal:
    """Per-island volatility metric that drives all three adaptation levels."""

    def __init__(self, rho: float = 0.9, epsilon: float = 1e-8):
        self.rho = rho
        self.epsilon = epsilon
        self.G = 0.0  # accumulated signal
        self.best_score = float('-inf')

    def update(self, new_score: float) -> float:
        """Update signal after evaluating a new program on this island."""
        if self.best_score > float('-inf'):
            delta = max((new_score - self.best_score) / (abs(self.best_score) + self.epsilon), 0.0)
        else:
            delta = 0.0

        self.G = self.rho * self.G + (1 - self.rho) * delta ** 2

        if new_score > self.best_score:
            self.best_score = new_score

        return self.G

11.2 Level 1: Local Adaptation (Exploration Intensity)

Each island dynamically adjusts its exploration intensity based on its accumulated improvement signal:

$$ I_t^{(k)} = I_{\min} + \frac{I_{\max} - I_{\min}}{1 + \sqrt{G_t^{(k)} + \epsilon}} $$

(Eq. 3 — Dynamic Exploration Intensity)

Where I_min=0.1, I_max=0.7. When G is high (productive trajectory), intensity is low → exploitation-dominant sampling from top-ranked parents. When G is low (stagnation), intensity rises → exploration-dominant sampling, more random parent selection, and broader mutation scope.

[!info] Interpretation Exploration intensity controls the balance between refining known good solutions (low I = focused diff patches from best parents) and trying creative leaps (high I = full rewrites from diverse parents). This is analogous to simulated annealing's temperature, but driven by observed improvement rather than a fixed cooling schedule.

def compute_exploration_intensity(G: float, I_min: float = 0.1, I_max: float = 0.7,
                                   epsilon: float = 1e-8) -> float:
    """Compute exploration intensity from accumulated improvement signal."""
    return I_min + (I_max - I_min) / (1 + (G + epsilon) ** 0.5)

11.3 Level 2: Global Adaptation (Cross-Island Resource Allocation)

Globally-Normalized Bandit Rewards

AdaEvolve's key insight: measuring improvement relative to each island's local best creates "poor island bias." Instead, rewards are normalized against the global best:

$$ r_t^{(k)} = \frac{f' - f_k^}{f_{\text{global}}^} $$

(Eq. 4 — Globally-Normalized Reward)

This prevents weak islands from receiving outsized credit for trivial refinements. An island improving from score 10 to 12 gets much less reward than one improving from 90 to 92 when f_global* = 95.

Decayed Cumulative Tracking

$$ R_t^{(k)} = \rho \cdot R_{t-1}^{(k)} + r_t^{(k)} $$ $$ V_t^{(k)} = \rho \cdot V_{t-1}^{(k)} + 1 $$

(Eq. 5 — Decayed Cumulative Tracking)

Exponential decay ensures stale early breakthroughs don't dominate future allocation. The visit count V_t^(k) is also decayed, ensuring UCB exploration bonuses reflect recent neglect, not total neglect.

UCB Island Selection

$$ k^* = \arg\max_k \left[\frac{R_k}{V_k} + C\sqrt{\frac{\ln N}{n_k}}\right] $$

(Eq. 6 — UCB Island Selection)

With C = √2. The first term (R_k/V_k) is exploitation (favor recently productive islands), the second is exploration (favor recently neglected islands).

class GlobalAdaptationBandit:
    """UCB bandit for cross-island resource allocation with global normalization."""

    def __init__(self, n_islands: int, rho: float = 0.9, C: float = 1.414):
        self.n_islands = n_islands
        self.rho = rho
        self.C = C
        self.R = np.zeros(n_islands)  # decayed cumulative reward
        self.V = np.zeros(n_islands)  # decayed visit count
        self.total_visits = 0
        self.global_best = float('-inf')

    def select_island(self) -> int:
        """Select next island to allocate compute to via UCB."""
        self.total_visits += 1
        ucb_values = np.zeros(self.n_islands)
        for k in range(self.n_islands):
            if self.V[k] < 1e-8:
                ucb_values[k] = float('inf')  # force exploration of unvisited
            else:
                exploit = self.R[k] / self.V[k]
                explore = self.C * np.sqrt(np.log(self.total_visits) / self.V[k])
                ucb_values[k] = exploit + explore
        return int(np.argmax(ucb_values))

    def update(self, island_k: int, new_score: float, island_best: float):
        """Update bandit state with globally-normalized reward."""
        if new_score > self.global_best:
            self.global_best = new_score
        reward = (new_score - island_best) / (abs(self.global_best) + 1e-8)
        self.R[island_k] = self.rho * self.R[island_k] + reward
        self.V[island_k] = self.rho * self.V[island_k] + 1

11.4 Level 3: Meta-Guidance (Tactical Generation)

When the accumulated improvement signal drops below a threshold across all islands simultaneously, AdaEvolve triggers meta-level LLM analysis:

[!important] Meta-Guidance Trigger Trigger condition: G_t^(k) ≤ 0.12 for all k Action: Invoke meta-LLM to analyze the evaluator code, identify bottlenecks in the current best program, and propose fundamentally different algorithmic approaches — not incremental improvements, but paradigm shifts.

META_GUIDANCE_PROMPT = """
You are analyzing a stalled optimization process. The evolutionary search has failed
to make progress across all islands for the last several iterations.

## Evaluator Code
```python
{evaluator_code}

Current Best Program (score: {best_score})

{best_program}

Recent Failed Approaches (last 10 mutations)

{failed_approaches}

Task

Propose 2-3 fundamentally different algorithmic approaches. Do NOT suggest incremental improvements. Instead, suggest: 1. A completely different algorithm class (e.g., switching from greedy to DP, from brute force to divide-and-conquer) 2. Concrete techniques with specific library functions (e.g., scipy.optimize.linear_sum_assignment) 3. A novel data structure that changes the problem's complexity class

Output each tactic as: TACTIC: [name] — [concrete description with code hints] """

**Example tactics generated in practice:**

- "Trust-region root finding for faster convergence on constrained optimization"
- "Voronoi-based initialization for better spatial coverage in packing problems"
- "Median filtering + linear sum assignment via scipy for robust matching"
- "Replace recursive DFS with iterative BFS + priority queue for better cache locality"

These tactics are **injected into subsequent mutation prompts** on all islands, forcing the entire search to shift direction. The tactic injection persists for a configurable window (default: 20 iterations) before the system re-evaluates stagnation.

### 11.5 Dynamic Island Spawning

A more extreme intervention than meta-guidance: when G_t^(k) ≤ 0.02 across all islands (severe stagnation), AdaEvolve dynamically spawns new islands to restart exploration from diverse seeds:

```python
def check_spawn_trigger(islands: list[Island]) -> bool:
    """Trigger new island spawning on severe global stagnation."""
    return all(island.accumulated_signal.G <= 0.02 for island in islands)

def spawn_island(population_db: PopulationDB, meta_tactics: list[str]) -> Island:
    """Create new island with diverse seeds and meta-tactic injection."""
    diverse_seeds = population_db.sample_diverse(n=5, method="maximin_distance")
    new_island = Island(
        seeds=diverse_seeds,
        meta_tactics=meta_tactics,  # inject current tactical directives
        exploration_intensity=0.7   # start with high exploration
    )
    return new_island

11.6 EVOLVE-BLOCK Markers

SkyDiscover uses EVOLVE-BLOCK-START / EVOLVE-BLOCK-END markers to designate mutable regions within programs. Code outside these markers is preserved as immutable context. If no markers are present, the entire program becomes mutable:

import numpy as np

# This code is immutable context
def load_data(path: str) -> np.ndarray:
    return np.load(path)

# EVOLVE-BLOCK-START
def solve(data: np.ndarray) -> float:
    """This function will be evolved by the search algorithm."""
    # Initial naive implementation
    return float(np.sum(data))
# EVOLVE-BLOCK-END

# This code is also immutable
if __name__ == "__main__":
    result = solve(load_data("input.npy"))
    print(f"Result: {result}")

11.7 Artifact Injection

Evaluation feedback is automatically incorporated into subsequent generation prompts through the artifacts return value. This creates a lightweight form of diagnostic feedback (similar to GEPA's ASI, but less structured):

def evaluate(program_path: str) -> dict:
    """Evaluator returns score + artifacts for next iteration's context."""
    score = run_tests(program_path)
    return {
        "combined_score": score,
        "artifacts": {
            "failed_test_cases": get_failures(program_path),
            "runtime_ms": measure_runtime(program_path),
            "memory_mb": measure_memory(program_path),
            "hint": "Consider dynamic programming for overlapping subproblems"
        }
    }

12 Programming Language

Aspect	Details
Framework language	Python
Evolved programs	Primarily Python; extensible to any language via custom evaluators
Dependencies	Standard scientific Python stack (numpy, scipy) + LLM provider SDKs
Configuration	YAML
Extensibility	Custom search algorithms via `skydiscover/search/` module interface; custom benchmarks via `benchmarks/README.md`

13 Memory Management

13.1 Population Database

Each island maintains its own population of candidate programs, with a global best tracker across all islands. The population management is algorithm-dependent:

AdaEvolve: Per-island archives with UCB-driven allocation; inter-island migration
EvoX: Self-evolving population with experience management
Native backends (OpenEvolve, GEPA): Delegated to the respective algorithm's own population management

13.2 Checkpoint System

Full state serialization enables resuming interrupted runs. The checkpoint includes:

All island states (programs, scores, accumulated improvement signals)
Bandit state (cumulative rewards, visit counts)
Meta-guidance tactical history
Global best program and score
Iteration counter and cost tracking

13.3 Artifact Injection as Context

Evaluation artifacts are stored alongside programs and injected into LLM context for subsequent mutations. This creates a lightweight "memory" where the search algorithm can learn from previous evaluation outcomes without a formal learning log system.

14 Continued Learning

14.1 Meta-Guidance as Accumulated Knowledge

AdaEvolve's meta-guidance system functions as a form of continued learning: each time stagnation triggers tactical generation, the tactics represent learned knowledge about what hasn't worked and what might work instead. Tactics persist across iterations and accumulate over the run.

14.2 EvoX Self-Evolving Strategy

The EvoX algorithm goes further: it co-adapts the search strategy itself alongside the solutions being evolved. The strategy parameters (parent selection weights, mutation scope, context assembly rules) are evolved using LLM-driven strategy mutation, creating a meta-level optimization loop.

14.3 No Formal Learning Logs

Unlike Darwinian Evolver's explicit learning log system, SkyDiscover does not maintain a structured cross-population mutation history. Knowledge transfer happens implicitly through:

Artifact injection (evaluation feedback into next mutation)
Meta-guidance tactics (high-level strategy shifts)
Inter-island migration (direct solution transfer)

[!warning] Gap SkyDiscover/AdaEvolve does not implement explicit learning logs (Darwinian Evolver), prompt co-evolution (ShinkaEvolve), or structured ASI diagnostics (GEPA). These represent integration opportunities for an OmniEvolve-type system. However, AdaEvolve's meta-guidance and adaptive resource allocation partially compensate by providing implicit learning through behavioral adaptation.

15 Applications and Benchmarks

15.1 Mathematical Optimization

14 tasks including circle packing, Erdős problems, Heilbronn triangles, geometric optimization. AdaEvolve matches or exceeds human SOTA on circle packing and achieves best open-source results.

15.2 Real-World Systems Optimization

9 tasks from the Algorithmic Discovery for Real Systems (ADRS) benchmark: cloud scheduling, load balancing, Mixture-of-Experts expert placement, GPU kernel optimization. The 41% cost reduction on cloud transfer and 14% improvement on GPU load balancing demonstrate practical impact beyond academic benchmarks.

15.3 Competitive Programming (Frontier-CS)

172 algorithm design problems drawn from competitive programming. The 50-LLM-call budget makes this a stringent test of sample efficiency. AdaEvolve's 21% improvement over OpenEvolve on mean score demonstrates that adaptive resource allocation significantly outperforms static scheduling.

15.4 AtCoder Heuristic Contest (ALE)

10 tasks derived from AtCoder Heuristic Contests, providing realistic optimization problems with complex evaluation landscapes.

15.5 Creative and NLP Tasks

Image generation evolution and HotPotQA prompt optimization demonstrate the framework's generality beyond traditional optimization. These tasks validate that the evaluator API (combined_score + artifacts) is sufficiently flexible for non-numeric optimization targets.

16 Comparison SkyDiscover vs Other Frameworks

16.1 Feature Matrix

Feature	SkyDiscover/AdaEvolve	OpenEvolve	ShinkaEvolve	GEPA	LLM4AD
Adaptation	✅ Three-level hierarchical	Static	Bandit LLM selection	Pareto-based	Method-specific
Island Management	UCB allocation + dynamic spawning	Fixed + ring migration	Dynamic spawning on stagnation	N/A (single population)	Method-specific
Resource Allocation	Globally-normalized UCB bandit	Equal across islands	Equal + bandit for LLM selection	N/A	N/A
Stagnation Response	Meta-guidance + island spawning	None	Dynamic island spawning	Reflection-driven mutation	None
Benchmarks	✅ 200+	~10	~20	~30	~50+
Multi-Algorithm	✅ Yes (6+ strategies)	1 (AlphaEvolve-style)	1 (custom)	1 (custom)	7 methods
Multi-Provider LLM	✅ Yes	✅ Yes	✅ Yes	✅ Yes	✅ Yes
Live Dashboard	✅ Yes	No	No	No	✅ Yes (GUI)
Checkpoint Resume	✅ Yes	✅ Yes	✅ Yes	⚠️ Partial	✅ Yes
Human Feedback	✅ Yes (dashboard)	No	No	No	No
Prompt Evolution	No	No	✅ Yes (v1.1)	No	No
Learning Logs	No (implicit via artifacts)	No	No	No	No
Diagnostic ASI	Partial (artifacts)	No	No	✅ Yes	No

16.2 Key Differentiators

What SkyDiscover Does Best

Adaptive resource allocation: The only system that dynamically adjusts compute distribution across search fronts based on observed improvement
Meta-guidance: Unique ability to generate tactical paradigm shifts on stagnation, preventing long plateau periods
Fair benchmarking: 200+ tasks in a unified platform enable rigorous cross-system comparison
Minimal configuration: AdaEvolve automates all scheduling decisions that other systems require hand-tuning
Systems optimization: Strong results on real-world infrastructure tasks beyond academic benchmarks

What SkyDiscover Lacks

No prompt co-evolution (ShinkaEvolve has this)
No formal learning logs (Darwinian Evolver has this)
No structured ASI diagnostics (GEPA has this — artifacts are informal)
No MAP-Elites quality-diversity (AlphaEvolve/OpenEvolve have this)
No self-modification (DGM has this)
No 2-tier novelty filtering (ShinkaEvolve has this)
No tree search integration (AB-MCTS has this)

[!tip] Integration Opportunity SkyDiscover's three-level hierarchical adaptation is orthogonal to most innovations in other systems. The accumulated improvement signal could drive ShinkaEvolve's prompt co-evolution (mutate prompts more aggressively when G is low). GEPA's ASI could feed richer information into AdaEvolve's meta-guidance generator. Darwinian Evolver's learning logs could provide historical context for tactical generation. These integrations are a natural fit for the OmniEvolve architecture.

SkyDiscover & AdaEvolve Technical Report — Evolutionary AI Systems Survey 2024–2026 Paper: arXiv:2602.20133 | Repository: github.com/skydiscover-ai/skydiscover | Project: skydiscover-ai.github.io

← Back to Survey Architecture Recommendations GEPA Details OpenEvolve Details