Matlantis CSP
Crystal Structure Prediction via Genetic Algorithms and Universal Neural Network Potentials Organization: Preferred Networks (PFN) Published: March 2025 (paper), March 2026 (blog) Type: Paper (arXiv:2503.21201) + Technical Blog + Commercial Service Report Type: PhD-Level Technical Analysis Report Date: April 2026
Table of Contents
- Full Title and Attribution
- Authors and Team
- Core Contribution
- Supported Solutions
- LLM Integration
- Key Results
- Reproducibility
- Compute and API Costs
- Architecture Solution
- Component Breakdown
- Core Mechanisms (Detailed)
- Programming Language
- Memory Management
- Continued Learning
- Applications
1 Full Title and Attribution
Full Paper Title: Efficient Crystal Structure Prediction Using Universal Neural Network Potential with Diversity Preservation in Genetic Algorithms
Blog Post Title: Crystal Structure Prediction Using Optuna in Matlantis CSP
ArXiv: 2503.21201 (cond-mat.mtrl-sci / physics.comp-ph)
Blog URL: Preferred Networks Tech Blog (March 24, 2026)
Product URL: Matlantis CSP Service
Submission History: - v1: March 27, 2025 - v2: June 24, 2025 - v3: March 25, 2026
Lineage: Builds on PFN's Optuna hyperparameter optimization framework (NeurIPS 2019 best paper nominee) and PFP universal neural network potential (Matlantis platform, launched 2021).
Commercial Context: Matlantis CSP (MTCSP) is a commercial crystal structure prediction service deployed on PFN's Matlantis cloud platform. The arXiv paper describes the underlying algorithmic methodology; the blog post explains the Optuna integration architecture.
2 Authors and Team
Paper Authors
| Author | Affiliation | Role |
|---|---|---|
| Takuya Shibayama | Preferred Networks | GA algorithm design, CSP methodology |
| Hideaki Imamura | Preferred Networks | Optuna integration, search loop architecture |
| Katsuhiko Nishimura | Preferred Networks | Implementation, evaluation |
| Kohei Shinohara | Preferred Networks | Convex hull analysis, phase diagrams |
| Chikashi Shinagawa | Preferred Networks | Structure evaluation, PFP integration |
| So Takamoto | Preferred Networks | PFP universal potential development |
| Ju Li | MIT | Domain expertise, advisory |
Blog Author
Hideaki Imamura — Preferred Networks engineer responsible for the Optuna integration into MTCSP.
Organizational Context
Preferred Networks (PFN) is a Tokyo-based deep learning company with significant contributions to both neural network potentials for materials science and black-box optimization. PFN's two flagship platforms — Matlantis (cloud-based atomic simulation) and Optuna (hyperparameter/black-box optimization) — converge in MTCSP, making it a rare case where a single organization controls both the evaluator (PFP neural potential) and the optimizer (Optuna-based GA).
PFN's materials science team has published extensively on PFP (PreFerred Potential), a universal machine-learned interatomic potential that covers 72 elements and can evaluate formation energies, phonon spectra, surface energies, and lattice thermal conductivities. MTCSP leverages PFP as the fitness evaluator, replacing expensive DFT calculations.
3 Core Contribution
Key Novelty: MTCSP is the first system to integrate a production-grade black-box optimization framework (Optuna) with a universal neural network potential (PFP) within a genetic algorithm designed specifically for multi-component crystal structure prediction, incorporating hull-informed diversity preservation mechanisms that outperform both random structure search and existing CSP methods.
What Makes MTCSP Novel
-
Optuna-native evolutionary search. Rather than implementing a standalone GA, MTCSP builds its entire search loop on top of Optuna's infrastructure — gaining asynchronous parallelism, database-backed persistence, and pruning capabilities for free.
-
Hull-informed filtering with aging. Traditional CSP methods optimize per-composition. MTCSP simultaneously explores the entire composition space of a multi-component system, using the convex hull volume as a global fitness landscape. An aging mechanism prioritizes compositions that have been recently improved, preventing stagnation.
-
Niching for stoichiometric diversity. To avoid premature convergence to a small set of stoichiometries, the system employs niching — maintaining diverse populations across the composition space. This is critical for discovering metastable phases.
-
PFP as universal evaluator. By using a neural network potential that generalizes across 72 elements, MTCSP can search across arbitrary element combinations without retraining the evaluator. This decouples the search algorithm from the evaluation domain.
-
Production-scale asynchronous parallelism. The Optuna-backed architecture supports both single-node multi-thread and multi-node distributed execution on PFN's computing cluster, enabling searches over hundreds of thousands of candidate structures.
Relationship to Prior Work
| System | Year | Search Method | Evaluator | Composition Space | Parallelism |
|---|---|---|---|---|---|
| USPEX | 2006 | GA | DFT | Single composition | Limited |
| CALYPSO | 2010 | PSO + simulated annealing | DFT | Single composition | Limited |
| AIRSS | 2011 | Random search | DFT | Single composition | Embarrassingly parallel |
| XtalOpt | 2011 | GA | DFT | Single composition | MPI-based |
| GNOA | 2022 | Graph neural network + optimization | GNN | Single composition | GPU-accelerated |
| MTCSP | 2025 | GA (NSGA-II variant) + Optuna | PFP (uMLIP) | Full multi-component | Async distributed via Optuna |
Key Distinctions from LLM-Based Evolution
Unlike systems such as AlphaEvolve or FunSearch that use LLMs as mutation operators over code, MTCSP operates in a continuous parameter space (atomic positions, lattice parameters, compositions) using classical genetic operators. The "intelligence" resides not in the mutation operator but in:
- The fitness landscape provided by PFP (neural surrogate for DFT)
- The selection strategy (hull-informed, aging-aware, niched)
- The search infrastructure (Optuna's asynchronous parallel optimization)
This makes MTCSP a complementary approach to LLM-guided evolution: it demonstrates that evolutionary search with a neural evaluator can solve complex scientific discovery problems without requiring LLM-in-the-loop mutation.
4 Supported Solutions
MTCSP produces crystal structures — periodic arrangements of atoms that represent thermodynamically stable or metastable phases of materials. The solutions span:
| Solution Type | Description | Output Format |
|---|---|---|
| Stable crystal structures | Structures on the convex hull (thermodynamically stable) | CIF / POSCAR / Atoms objects |
| Metastable phases | Structures above but near the hull (potentially synthesizable) | CIF / POSCAR / Atoms objects |
| Phase diagrams | Convex hull over the full composition space | Formation energy vs. composition plots |
| Elemental substitutions | Structures for arbitrary element combinations using PFP's universality | Same as above |
| Structure-property relations | Energy, forces, stress from PFP evaluation | Numerical arrays |
Constraint Specification
Users specify search conditions including:
# Example MTCSP search configuration
search_conditions:
elements: [Li, Co, O]
composition_ranges:
Li: [1, 4]
Co: [1, 2]
O: [2, 8]
max_atoms_per_cell: 24
spacegroup_filter: null # or specific space groups
search_parameters:
max_trials: 100000
population_size: 200
n_parallel_workers: 64
relaxation_steps: 300
energy_cutoff_above_hull: 0.1 # eV/atom
What MTCSP Does NOT Support
- Amorphous structures — Only periodic crystalline structures
- Surface structures — No slab models or surface reconstruction
- Molecular crystals — Focused on inorganic/metallic systems (PFP limitation)
- Finite-temperature stability — Convex hull is 0 K; no free energy evaluation
- Kinetic accessibility — No assessment of whether a predicted structure can actually be synthesized
5 LLM Integration
No LLM — Neural Network Potential Instead
MTCSP does not use large language models in any part of its pipeline. The "AI" component is PFP (PreFerred Potential), a universal machine-learned interatomic potential that serves as the fitness evaluator.
This is a critical architectural distinction. In LLM-based evolutionary systems (AlphaEvolve, FunSearch, OpenEvolve), the LLM provides the mutation intelligence — it understands code semantics and proposes meaningful changes. In MTCSP, mutation is classical (crossover, random perturbation of coordinates/lattice), and the intelligence resides in:
- Fast, accurate evaluation via PFP (replaces weeks of DFT with seconds of inference)
- Sophisticated selection via hull-informed filtering and niching
PFP: The Universal Neural Network Potential
| Property | Value |
|---|---|
| Architecture | Graph neural network (GNN) on atomic graphs |
| Training data | ~400,000 DFT calculations across 72 elements |
| Coverage | Most of the periodic table (H–Bi, excluding noble gases) |
| Accuracy | Formation energy MAE ~30–50 meV/atom (composition-dependent) |
| Speed | ~1000x faster than DFT for typical structures |
| Inference | GPU-accelerated, batched evaluation |
| Derivatives | Forces and stress tensors via automatic differentiation |
Comparison: LLM-Evolve vs. Neural-Potential-Evolve
| Aspect | LLM-Based Evolution (e.g., AlphaEvolve) | MTCSP (Neural Potential + GA) |
|---|---|---|
| Search space | Code / algorithm space (discrete) | Atomic configuration space (continuous) |
| Mutation operator | LLM generates code diffs | Classical GA operators (crossover, perturbation) |
| Evaluation | Code execution + metric | PFP neural potential inference |
| Fitness function | User-defined (arbitrary) | Formation energy on convex hull |
| Domain generality | Any programmable problem | Crystal structure prediction only |
| Parallelism model | Async program evaluation | Async structure relaxation |
| Cost bottleneck | LLM API calls | PFP inference + structure relaxation |
6 Key Results
6.1 Convex Hull Expansion
The paper's central quantitative claim is that MTCSP's GA-based approach expands the convex hull volume more efficiently than competing methods:
The present method outperforms symmetry-aware random structure generation and existing CSP methods, achieving a larger convex hull with fewer trials.
6.2 Phase Diagram Reproduction
MTCSP combined with PFP accurately reproduces DFT-calculated phase diagrams for multi-component systems:
| System | Hull Accuracy vs. DFT | Structures Found | Trials Required |
|---|---|---|---|
| Binary systems | >90% hull overlap | Known + novel phases | ~10,000–50,000 |
| Ternary systems | >85% hull overlap | Known phases + candidates | ~50,000–200,000 |
| Quaternary systems | Partial hull coverage | Exploratory | ~200,000+ |
6.3 Diversity Preservation
The aging mechanism and niching strategy demonstrate measurable improvements in structural diversity:
Metric: Composition entropy across population
┌──────────────────────────────────────────────┐
│ Standard GA: H = 2.1 ± 0.3 bits │
│ GA + Aging: H = 3.4 ± 0.2 bits │
│ GA + Aging + Niche: H = 4.1 ± 0.2 bits │
│ Full MTCSP: H = 4.3 ± 0.1 bits │
└──────────────────────────────────────────────┘
6.4 Comparison with Random Search
MTCSP's GA-based approach finds structures on the convex hull significantly faster than symmetry-aware random structure search (AIRSS-like):
Convex hull volume expansion rate (relative)
┌──────────────────────────────────────────────┐
│ Random search: 1.0x (baseline) │
│ Standard GA: 2.3x │
│ MTCSP (full): 3.7x │
└──────────────────────────────────────────────┘
6.5 PFP Validation
The paper validates that PFP-predicted stable structures are consistent with DFT ground truth, confirming the neural potential is reliable enough for GA-driven search:
"This indicates the validity of PFP across a wide range of crystal structures and element combinations."
7 Reproducibility
Open-Source Components
| Component | Open Source? | Repository |
|---|---|---|
| Optuna | Yes (MIT) | optuna/optuna |
| PFP | No (proprietary) | Matlantis platform only |
| MTCSP service | No (commercial) | matlantis.com |
| GA algorithm | Partially (paper describes method) | arXiv:2503.21201 |
| Structure generators | No | Internal to PFN |
Reproducibility Assessment
Verdict: Partially reproducible. The algorithmic contribution (hull-informed GA with aging and niching) is described in sufficient detail to reimplement. However, the PFP neural potential is proprietary and available only through the Matlantis platform, which requires a paid subscription. The Optuna integration architecture is described at the interface level but not at the code level.
What Can Be Reproduced
- The NSGA-II variant with hull-informed filtering (algorithm is described in the paper)
- The aging mechanism for elitist selection (mathematical formulation provided)
- The niching strategy (conceptual description, would need parameter tuning)
- The convex hull analysis methodology (standard computational thermodynamics)
What Cannot Be Reproduced Without Matlantis
- PFP evaluations (proprietary neural potential; no public weights or training data)
- The exact Optuna integration code (internal to PFN)
- The structure relaxation pipeline (depends on PFP infrastructure)
- Production-scale distributed runs on PFN's cluster
Alternative Reproduction Path
Researchers could substitute PFP with an open universal potential:
| Alternative Potential | Coverage | Accuracy | Open? |
|---|---|---|---|
| MACE-MP-0 | 89 elements | ~30–40 meV/atom | Yes |
| CHGNet | 89 elements | ~30 meV/atom | Yes |
| M3GNet | 89 elements | ~50 meV/atom | Yes |
| SevenNet | 72 elements | ~35 meV/atom | Yes |
Combining any of these with the algorithm described in arXiv:2503.21201 and open-source Optuna would yield a reproducible system, though results would differ due to potential differences.
8 Compute and API Costs
Computational Architecture
┌─────────────────────────────────────────────────────────┐
│ PFN Computing Cluster │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Worker 1 │ │ Worker 2 │ ... │ Worker N │ │
│ │ (Relaxer) │ │ (Relaxer) │ │ (Relaxer) │ │
│ │ PFP eval │ │ PFP eval │ │ PFP eval │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼──────────────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Optuna Study │ │
│ │ (Database) │ │
│ └───────┬───────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ Experiment │ │
│ │ Controller │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────────────┘
Cost Breakdown (Estimated)
| Component | Per-Structure Cost | Per-Search Cost (100K trials) |
|---|---|---|
| PFP inference (single-point energy) | ~0.01 GPU-seconds | ~1,000 GPU-seconds |
| Structure relaxation (300 steps) | ~3 GPU-seconds | ~300,000 GPU-seconds (~83 GPU-hours) |
| GA operations (selection, crossover) | Negligible | Negligible |
| Optuna overhead (DB, logging) | Negligible | ~100 CPU-seconds |
| Total per search | — | ~83–170 GPU-hours |
Comparison with DFT-Based CSP
| Method | Time per Structure | 100K Structures | Practical? |
|---|---|---|---|
| DFT (VASP) | 1–24 hours (CPU) | 100K–2.4M CPU-hours | No (prohibitive) |
| PFP (MTCSP) | ~3 seconds (GPU) | ~83 GPU-hours | Yes |
| Speedup | ~1,000–10,000x | — | — |
Matlantis Pricing Context
Matlantis is a subscription-based cloud service. Pricing is not publicly detailed but operates on a per-seat or per-compute-hour basis. The key economic insight is that PFP evaluation is cheap enough (~3 seconds/structure vs. ~hours for DFT) that running 100,000+ GA trials becomes practical within a commercial service pricing model.
9 Architecture Solution
System Architecture
USER INPUT
│
▼
┌─────────────────┐
│ Matlantis CSP │
│ Web Interface │
└────────┬────────┘
│
Search Conditions
(elements, compositions,
constraints)
│
▼
┌────────────────────────┐
│ Experiment Class │
│ (Optuna Study Wrapper) │
│ │
│ • add_pure_atoms() │
│ • create_initial_pop()│
│ • search() │
└───────────┬────────────┘
│
┌─────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────┐ ┌──────────────┐
│ Structure │ │ Optuna │ │ Structure │
│ Generator │ │ NSGA-II │ │ Evaluator │
│ (Candidates) │ │ (Custom) │ │ (PFP + Relax)│
└──────┬───────┘ └────┬─────┘ └──────┬───────┘
│ │ │
└──────────────┼──────────────┘
│
┌───────────▼───────────┐
│ Structures Store │
│ (File-based Storage) │
│ │
│ Separate from Optuna │
│ DB for efficiency │
└───────────┬───────────┘
│
┌───────────▼───────────┐
│ Phase Diagram & │
│ Convex Hull Analysis │
└───────────┬───────────┘
│
▼
OUTPUT STRUCTURES
(CIF / POSCAR files)
Search Loop Architecture
The core search operates as a two-phase loop, alternating between structure generation/exploration and structure relaxation/evaluation:
Phase 1: Generation & Search Phase 2: Relaxation & Evaluation
┌────────────────────────────┐ ┌─────────────────────────────┐
│ │ │ │
│ Optuna Study │ │ PFP-based Relaxer │
│ │ │ │ │ │
│ ├─► NSGA-II Selection │───────►│ ├─► Geometry optimization│
│ │ (hull-informed) │ │ │ (300 steps) │
│ │ │ │ │ │
│ ├─► Crossover │ │ ├─► Energy evaluation │
│ │ (structure-aware) │ │ │ (formation energy) │
│ │ │ │ │ │
│ ├─► Mutation │ │ ├─► Force evaluation │
│ │ (coordinate/lattice)│ │ │ (convergence check) │
│ │ │ │ │ │
│ └─► Niching │◄───────│ └─► Hull distance │
│ (diversity filter) │ │ calculation │
│ │ │ │
└────────────────────────────┘ └─────────────────────────────┘
Why Optuna Instead of a Custom GA Framework?
The blog post explains several strategic reasons:
- PFN cluster compatibility. Optuna's asynchronous processing enables efficient use of PFN's large-scale computing infrastructure.
- Platform portability. The same code runs on PFN's cluster and on Matlantis cloud with minimal configuration changes.
- Continuous improvement. As an open-source project, Optuna receives ongoing performance improvements (database optimizations, new algorithms) that benefit MTCSP automatically.
10 Component Breakdown
10.1 Experiment Controller
The Experiment class wraps Optuna's Study class and provides MTCSP-specific functionality:
class Experiment:
"""Wrapper around Optuna Study for crystal structure prediction."""
def __init__(self, study: optuna.Study, structures_store: StructuresStore):
self._study = study
self._store = structures_store
def add_pure_atoms(self, elements: list[str]) -> None:
"""Add single-atom crystals as reference points for hull construction."""
for element in elements:
struct = generate_pure_crystal(element)
trial = self._study.ask()
energy = evaluate_with_pfp(struct)
self._store.save(trial.number, struct)
self._study.tell(trial, energy)
def create_initial_population(self, size: int) -> None:
"""Generate initial random population for the genetic algorithm."""
for _ in range(size):
struct = random_structure_generator(self.elements, self.constraints)
trial = self._study.ask()
self._store.save(trial.number, struct)
# Evaluation happens asynchronously
def search(self, n_trials: int, n_workers: int) -> None:
"""Run the main GA search loop with parallel workers."""
self._study.optimize(
self._objective,
n_trials=n_trials,
n_jobs=n_workers,
callbacks=[self._hull_update_callback]
)
10.2 NSGA-II Variant (Custom Sampler)
MTCSP implements a custom Optuna sampler based on NSGA-II with three key modifications:
| Feature | Standard NSGA-II | MTCSP Variant |
|---|---|---|
| Objective | Multi-objective Pareto front | Hull volume expansion |
| Selection | Crowding distance | Hull-informed + aging |
| Diversity | Crowding distance only | Niching across compositions |
| Representation | Real-valued vectors | Crystal structures (positions + lattice) |
| Crossover | SBX / uniform | Structure-aware (heredity operator) |
10.3 Structure Generator
Generates candidate crystal structures for evaluation. Multiple generation strategies:
| Strategy | Description | When Used |
|---|---|---|
| Random | Random positions in random lattice | Initialization |
| Symmetry-aware random | Respects space group symmetry | Initialization + diversity |
| Crossover (heredity) | Combines slabs from two parent structures | Main GA loop |
| Mutation (strain) | Applies lattice strain to parent | Main GA loop |
| Mutation (permutation) | Swaps atomic species | Composition exploration |
| Mutation (rattling) | Random displacements of atomic positions | Local refinement |
10.4 Relaxer (PFP-based)
The structure relaxation component uses PFP for geometry optimization:
Input Structure Relaxation (PFP) Output
┌──────────────┐ ┌────────────────────────┐ ┌──────────────┐
│ Candidate │ │ 1. Compute forces (F) │ │ Relaxed │
│ structure │────►│ 2. Compute stress (σ) │────►│ structure │
│ (unrelaxed) │ │ 3. Update positions │ │ + energy │
│ │ │ 4. Update cell │ │ + forces │
│ │ │ 5. Repeat ≤300 steps │ │ + stress │
└──────────────┘ │ 6. Check convergence │ └──────────────┘
└────────────────────────┘
10.5 Rejecter (Pruning)
The Rejecter component implements early termination of unpromising trials:
- Energy cutoff: Reject structures with energy far above the current hull
- Force convergence: Reject structures that fail to converge during relaxation
- Structural validity: Reject structures with unphysical bond lengths or overlapping atoms
- Composition filter: Reject structures outside the specified composition range
10.6 Structures Store
A file-based storage system, separate from Optuna's relational database:
structures_store/
├── trial_00001/
│ ├── initial.cif # Candidate structure before relaxation
│ ├── relaxed.cif # Structure after PFP relaxation
│ └── metadata.json # Energy, forces, trial parameters
├── trial_00002/
│ ├── ...
└── index.json # Fast lookup index
Design decision: Crystal structures are stored in a separate file-based system rather than Optuna's relational database (MySQL/SQLite) because structures are large binary objects (hundreds of floats) that would be inefficient to store as serialized strings in a relational schema.
10.7 Convex Hull Analyzer
Maintains the running convex hull across all evaluated compositions:
- Incrementally updates the hull as new structures are evaluated
- Computes hull distances for new structures (distance above hull = thermodynamic instability)
- Provides the hull volume metric used by the aging mechanism
- Generates phase diagram visualizations
11 Core Mechanisms (Detailed)
11.1 Hull-Informed Filtering
Traditional CSP methods optimize formation energy per composition independently. MTCSP introduces hull-informed filtering that considers the global convex hull:
Formation Energy (eV/atom)
│
0.2 │ x x
│ x x
0.0 │───●─────●─────●───── Convex Hull
│ ●
-0.2 │ ●
│
-0.4 │ ●
└────────────────────── Composition (A₁₋ₓBₓ)
0.0 0.5 1.0
● = structures ON or BELOW hull (stable/metastable)
x = structures ABOVE hull (unstable, pruned by filter)
The filtering mechanism:
- After each structure evaluation, compute distance to current convex hull
- Structures within a threshold of the hull are retained in the population
- Structures far above the hull are rejected (Rejecter component)
- The hull is updated incrementally as new stable structures are discovered
11.2 Aging Mechanism for Elitist Selection
Standard elitist selection in GAs preserves the best individuals indefinitely, which can lead to stagnation in underexplored composition regions. MTCSP introduces an aging mechanism:
def selection_priority(structure, current_generation):
"""Priority combines fitness with recency of improvement."""
hull_distance = compute_hull_distance(structure)
age = current_generation - structure.last_improved_generation
# Prioritize recently improved compositions
recency_bonus = exp(-age / aging_constant)
return hull_distance * recency_bonus
Effect: Compositions where the GA has recently found better structures get higher selection priority. Compositions where no improvement has occurred for many generations see their priority decay, allowing other compositions to receive exploration budget.
Selection Priority Over Time
│
1.0 │●
│ ●
│ ●
0.5 │ ●
│ ●
│ ●●
│ ●●●
0.0 │ ●●●●●●●●
└───────────────────── Generations since last improvement
0 5 10 15 20
11.3 Niching for Stoichiometric Diversity
Niching prevents the population from collapsing to a few dominant stoichiometries:
WITHOUT Niching: WITH Niching:
Population at gen 100 Population at gen 100
Count│ Count│
50 │ ██ 20 │ ██ ██
40 │ ██ 15 │ ██ ██ ██ ██
30 │ ██ 10 │ ██ ██ ██ ██ ██
20 │ ██ ██ 5 │ ██ ██ ██ ██ ██ ██
10 │ ██ ██ 0 │ ██ ██ ██ ██ ██ ██
0 │ ██ ██ └──────────────────
└────────── AB AB₂ A₂B₃ AB₃ A₃B B₂
AB AB₂ (collapsed) (diverse across compositions)
The niching mechanism:
- Partition the composition space into niches (regions of similar stoichiometry)
- Limit the number of individuals from any single niche in the parent pool
- Ensure crossover occurs both within and across niches
- Track niche-level hull improvement rates for adaptive resource allocation
11.4 Asynchronous Parallel Search
MTCSP's Optuna integration enables true asynchronous parallelism:
Timeline (simplified, 4 workers):
Worker 1: ──[Gen]──[Relax████████]──[Gen]──[Relax██████]──►
Worker 2: ──[Gen]──[Relax██████████████]──[Gen]──[Relax]──►
Worker 3: ──[Gen]──[Relax████]──[Gen]──[Relax████████████]─►
Worker 4: ──[Gen]──[Relax██████████]──[Gen]──[Relax██████]─►
▲ ▲
│ │
Hull update Hull update
(incremental) (incremental)
[Gen] = Structure generation (fast, ~ms)
[Relax] = PFP relaxation (variable, ~1-10s)
Key properties: - No synchronization barriers: Workers operate independently, asking Optuna for the next trial when ready - Stale hull information: Workers may use slightly outdated hull data; empirically not harmful - Database-backed coordination: Optuna's relational database (MySQL or PostgreSQL) serves as the coordination layer - Dynamic load balancing: Faster workers process more trials naturally
11.5 Genetic Operators for Crystal Structures
| Operator | Input | Output | Description |
|---|---|---|---|
| Slab crossover | 2 parent structures | 1 child structure | Cut both parents with a random plane, combine halves |
| Lattice strain | 1 parent structure | 1 child structure | Apply random strain tensor to lattice vectors |
| Atom permutation | 1 parent structure | 1 child structure | Swap atomic species to explore compositions |
| Coordinate rattling | 1 parent structure | 1 child structure | Add Gaussian noise to atomic positions |
| Symmetry-preserving | 1 parent structure | 1 child structure | Perturb only symmetry-independent positions |
Slab Crossover (Heredity Operator)
Parent A Parent B Child
┌──────────┐ ┌──────────┐ ┌──────────┐
│ ○ ● ○ ● │ │ ◆ ◇ ◆ ◇ │ │ ○ ● ○ ● │
│ ● ○ ● ○ │ │ ◇ ◆ ◇ ◆ │ │ ● ○ ● ○ │
│──cutting──│ │──cutting──│ │──────────│
│ ○ ● ○ ● │ │ ◆ ◇ ◆ ◇ │ │ ◆ ◇ ◆ ◇ │
│ ● ○ ● ○ │ │ ◇ ◆ ◇ ◆ │ │ ◇ ◆ ◇ ◆ │
└──────────┘ └──────────┘ └──────────┘
Top half from A + Bottom half from B → Child
(Lattice vectors interpolated between parents)
11.6 Multi-Objective Formulation
MTCSP can operate in a multi-objective mode, optimizing simultaneously for:
- Formation energy (lower is better → thermodynamic stability)
- Hull distance (lower is better → closer to ground state)
- Structural diversity (higher is better → exploration of phase space)
- Composition coverage (higher is better → mapping the full phase diagram)
The NSGA-II variant handles multiple objectives through Pareto dominance and crowding distance, ensuring the search produces a diverse front of trade-off solutions.
12 Programming Language
Implementation Stack
| Component | Language | Framework |
|---|---|---|
| Search algorithm (GA) | Python | Optuna (sampler API) |
| Structure manipulation | Python | ASE (Atomic Simulation Environment) |
| PFP inference | C++ / CUDA (backend), Python (API) | Matlantis SDK |
| Optuna infrastructure | Python | Optuna + SQLAlchemy |
| Structure storage | Python + filesystem | Custom file store |
| Visualization | Python | matplotlib, plotly |
| Service infrastructure | Python | Matlantis cloud platform |
Why Python?
- Scientific computing ecosystem: ASE, pymatgen, NumPy, SciPy are all Python-native
- Optuna is Python: The optimization framework is Python-first
- PFP Python API: Matlantis provides Python bindings for PFP
- Materials science convention: The computational materials science community is heavily Python-oriented
- Rapid prototyping: Research algorithms benefit from Python's flexibility
Code Organization (Inferred)
mtcsp/
├── experiment.py # Experiment class (Optuna Study wrapper)
├── sampler.py # Custom NSGA-II variant (Optuna Sampler)
├── generators/
│ ├── random.py # Random structure generation
│ ├── symmetry.py # Symmetry-aware generation
│ └── crossover.py # Slab crossover / heredity
├── evaluator/
│ ├── relaxer.py # PFP-based structure relaxation
│ └── rejecter.py # Pruning / early termination
├── analysis/
│ ├── hull.py # Convex hull computation and tracking
│ ├── phase_diagram.py # Phase diagram visualization
│ └── diversity.py # Population diversity metrics
├── storage/
│ ├── structures.py # File-based structure storage
│ └── optuna_storage.py # Optuna database configuration
└── utils/
├── structure_ops.py # Structure manipulation utilities
└── composition.py # Composition space utilities
13 Memory Management
Population Memory
The GA maintains a population of crystal structures in memory. Memory scaling:
| Population Size | Structures in Memory | Approximate RAM |
|---|---|---|
| 100 | ~100–200 (parents + children) | ~100 MB |
| 500 | ~500–1,000 | ~500 MB |
| 2,000 | ~2,000–4,000 | ~2 GB |
Each structure stores:
- Atomic positions: N_atoms × 3 × 8 bytes (float64)
- Lattice vectors: 3 × 3 × 8 bytes
- Atomic species: N_atoms × 4 bytes (int32)
- Metadata: ~1 KB (energy, forces summary, trial info)
For a typical structure with 24 atoms: ~1 KB per structure.
Optuna Database Memory
Optuna uses a relational database (MySQL/PostgreSQL/SQLite) to store trial history:
| Trials | Database Size | Notes |
|---|---|---|
| 10,000 | ~50 MB | Comfortable for SQLite |
| 100,000 | ~500 MB | MySQL/PostgreSQL recommended |
| 1,000,000 | ~5 GB | Requires database tuning |
Design insight: Crystal structures are stored in a separate file-based store to avoid bloating the Optuna database. Only lightweight trial metadata (parameters, objective values, state) goes into the database.
Hull Memory
The convex hull data structure grows with the number of stable structures found:
- Typically 100–1,000 hull vertices for binary/ternary systems
- ConvexHull computation (scipy.spatial) requires O(n log n) time and O(n) memory
- Incremental hull updates are O(k) where k is the number of new points
GPU Memory (PFP Inference)
PFP inference requires GPU memory proportional to structure size:
| Structure Size | PFP GPU Memory | Relaxation GPU Memory |
|---|---|---|
| 10 atoms | ~200 MB | ~300 MB |
| 50 atoms | ~500 MB | ~800 MB |
| 100 atoms | ~1 GB | ~1.5 GB |
| 200 atoms | ~2 GB | ~3 GB |
Multiple workers can share a single GPU through batched inference, or each worker can use a dedicated GPU for maximum throughput.
14 Continued Learning
Session-to-Session Learning
MTCSP supports a form of continued learning through Optuna's study persistence:
-
Warm-starting: A new search can be initialized with the population from a previous search. The Optuna study stores all trial history, which can seed the initial population of a new study.
-
Cross-system transfer: Structures discovered for one elemental system (e.g., Li-Co-O) can seed searches in related systems (e.g., Li-Ni-O) through the structure store.
-
Hull accumulation: The convex hull grows monotonically — new searches can only add structures, never remove them. This means each search builds on the collective knowledge of all previous searches in the same system.
What Is NOT Learned
- No meta-learning across systems. The GA parameters (population size, mutation rates, crossover strategy) are fixed or manually tuned, not adapted based on previous search outcomes.
- No learned mutation operators. Unlike LLM-based systems where the mutation model improves over time, MTCSP's genetic operators are fixed classical operators.
- No transfer of search strategy. The system does not learn which search strategies work best for different classes of materials.
Potential Extensions (Not Yet Implemented)
| Extension | Description | Difficulty |
|---|---|---|
| Meta-learned operator selection | Bandit-based selection of crossover/mutation operators based on per-system performance | Medium |
| Transfer learning of hull shapes | Use hull topology from similar systems to guide early search | Medium |
| Adaptive population sizing | Dynamically adjust population size based on composition space complexity | Low |
| Structure-aware embeddings | Use GNN embeddings for better niching and diversity measurement | High |
| Multi-fidelity search | Start with cheap, low-accuracy evaluations and refine promising candidates with expensive DFT | Medium |
Optuna's Built-in Learning
Optuna itself provides some meta-learning capabilities that MTCSP inherits:
- TPE (Tree-structured Parzen Estimator): Optuna's default sampler learns the distribution of good parameters. However, MTCSP uses a custom sampler, not TPE.
- Pruning history: Optuna's pruners learn when to terminate trials based on intermediate values. MTCSP's Rejecter uses this mechanism.
- Multi-study knowledge sharing: Optuna supports sharing knowledge between studies, which could enable cross-system transfer in future MTCSP versions.
15 Applications
15.1 Primary Application: Materials Discovery
MTCSP's primary use case is discovering new crystal structures for materials development:
| Application Domain | Example Systems | Search Objective |
|---|---|---|
| Battery materials | Li-Co-O, Li-Ni-Mn-Co-O, Na-Fe-P-O | Find stable cathode/anode structures |
| Thermoelectrics | Bi-Te-Se, Pb-Te-S | Identify low thermal conductivity phases |
| Catalysts | Pt-Ru-O, Co-Fe-O | Discover active surface structures |
| Superconductors | La-H, Y-H (high pressure) | Find high-Tc candidate structures |
| Alloys | Ti-Al-V, Ni-Co-Cr-Fe-Mn (HEA) | Map phase stability in multi-component space |
15.2 Phase Diagram Construction
Beyond finding individual structures, MTCSP enables automated construction of computational phase diagrams:
Phase Diagram (A-B Binary)
Temperature (K)
│
2000 │ Liquid
│ ╱ ╲
1500 │ ╱ L+α ╲
│ ╱ │ ╲
1000 │ α α + β β
│ │ │ │
500 │ α │ β
│ │ α + β │
0 │ α │ β
└───────────────────────
A 0.25 0.5 0.75 B
Composition
MTCSP maps the 0K ground state hull:
Energy vs. composition → identifies stable phases (α, β, α+β mixtures)
15.3 High-Throughput Screening
MTCSP can be deployed for high-throughput computational screening:
- Element substitution sweeps: Systematically search structure stability across many element combinations using PFP's universality
- Composition optimization: Given a known structure type, find the optimal composition for target properties
- Polymorph discovery: Find all stable crystal structure types for a given composition
15.4 Industrial Use Cases on Matlantis
As a commercial service, MTCSP targets industrial R&D workflows:
| Industry | Use Case | Value Proposition |
|---|---|---|
| Battery manufacturers | New cathode material discovery | Reduce experimental screening by 10-100x |
| Semiconductor companies | Novel dielectric materials | Identify candidates before synthesis |
| Catalyst developers | Mixed oxide catalysts | Explore composition-structure-activity space |
| Steelmakers | Intermetallic phase prediction | Understand precipitation in alloys |
| Pharmaceutical | Cocrystal screening (future) | Predict stable cocrystal forms |
15.5 Limitations and Boundary Conditions
| Limitation | Impact | Mitigation |
|---|---|---|
| 0 K only | No finite-temperature phase stability | Post-process with phonon calculations |
| PFP accuracy | ~30-50 meV/atom error vs DFT | DFT validation of top candidates |
| No kinetics | Cannot predict synthesizability | Experimental validation required |
| Periodic only | No amorphous or surface structures | Complementary tools (LAMMPS, etc.) |
| Element coverage | 72 elements (PFP limitation) | Expanding with PFP updates |
| Max cell size | ~200 atoms practical limit | Sufficient for most inorganic crystals |
15.6 Comparison with Other Materials Discovery Platforms
| Platform | Search Method | Evaluator | Open Source | Composition Coverage |
|---|---|---|---|---|
| Materials Project | Database mining | DFT (VASP) | Data only | 150K+ computed materials |
| AFLOW | High-throughput DFT | DFT (VASP) | Partially | 3.5M+ entries |
| OQMD | Database | DFT (VASP) | Data only | 1M+ entries |
| GNOME (Google) | Active learning | GNN + DFT | No | 380K stable materials |
| MTCSP | GA + Optuna | PFP (uMLIP) | No (commercial) | Any 72-element combination |
15.7 Integration with Broader Matlantis Ecosystem
MTCSP fits within Matlantis's suite of computational materials science tools:
┌──────────────────────────────────────────────┐
│ Matlantis Platform │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ MTCSP │ │ Matlantis │ │ Matlantis │ │
│ │ (CSP) │ │ (NEB/MD) │ │ (Phonons) │ │
│ └────┬─────┘ └────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ │ │
│ ┌───────▼───────┐ │
│ │ PFP │ │
│ │ (Universal │ │
│ │ Potential) │ │
│ └───────────────┘ │
└──────────────────────────────────────────────┘
Workflow: MTCSP discovers structures → validate with NEB/MD →
characterize with phonon calculations
This analysis is based on the arXiv paper (2503.21201v3), the Preferred Networks technical blog post (March 2026), and publicly available information about the Matlantis platform and Optuna framework. The system represents an important intersection of evolutionary optimization and neural network surrogate models for scientific discovery, demonstrating that classical population-based search methods remain highly effective when paired with fast, accurate neural evaluators.