Matlantis CSP

Crystal Structure Prediction via Genetic Algorithms and Universal Neural Network Potentials Organization: Preferred Networks (PFN) Published: March 2025 (paper), March 2026 (blog) Type: Paper (arXiv:2503.21201) + Technical Blog + Commercial Service Report Type: PhD-Level Technical Analysis Report Date: April 2026

Full Title and Attribution
Authors and Team
Core Contribution
Supported Solutions
LLM Integration
Key Results
Reproducibility
Compute and API Costs
Architecture Solution
Component Breakdown
Core Mechanisms (Detailed)
Programming Language
Memory Management
Continued Learning
Applications

1 Full Title and Attribution

Full Paper Title: Efficient Crystal Structure Prediction Using Universal Neural Network Potential with Diversity Preservation in Genetic Algorithms

Blog Post Title: Crystal Structure Prediction Using Optuna in Matlantis CSP

ArXiv: 2503.21201 (cond-mat.mtrl-sci / physics.comp-ph)

Blog URL: Preferred Networks Tech Blog (March 24, 2026)

Product URL: Matlantis CSP Service

Submission History: - v1: March 27, 2025 - v2: June 24, 2025 - v3: March 25, 2026

Lineage: Builds on PFN's Optuna hyperparameter optimization framework (NeurIPS 2019 best paper nominee) and PFP universal neural network potential (Matlantis platform, launched 2021).

Commercial Context: Matlantis CSP (MTCSP) is a commercial crystal structure prediction service deployed on PFN's Matlantis cloud platform. The arXiv paper describes the underlying algorithmic methodology; the blog post explains the Optuna integration architecture.

2 Authors and Team

Paper Authors

Author	Affiliation	Role
Takuya Shibayama	Preferred Networks	GA algorithm design, CSP methodology
Hideaki Imamura	Preferred Networks	Optuna integration, search loop architecture
Katsuhiko Nishimura	Preferred Networks	Implementation, evaluation
Kohei Shinohara	Preferred Networks	Convex hull analysis, phase diagrams
Chikashi Shinagawa	Preferred Networks	Structure evaluation, PFP integration
So Takamoto	Preferred Networks	PFP universal potential development
Ju Li	MIT	Domain expertise, advisory

Blog Author

Hideaki Imamura — Preferred Networks engineer responsible for the Optuna integration into MTCSP.

Organizational Context

Preferred Networks (PFN) is a Tokyo-based deep learning company with significant contributions to both neural network potentials for materials science and black-box optimization. PFN's two flagship platforms — Matlantis (cloud-based atomic simulation) and Optuna (hyperparameter/black-box optimization) — converge in MTCSP, making it a rare case where a single organization controls both the evaluator (PFP neural potential) and the optimizer (Optuna-based GA).

PFN's materials science team has published extensively on PFP (PreFerred Potential), a universal machine-learned interatomic potential that covers 72 elements and can evaluate formation energies, phonon spectra, surface energies, and lattice thermal conductivities. MTCSP leverages PFP as the fitness evaluator, replacing expensive DFT calculations.

3 Core Contribution

Key Novelty: MTCSP is the first system to integrate a production-grade black-box optimization framework (Optuna) with a universal neural network potential (PFP) within a genetic algorithm designed specifically for multi-component crystal structure prediction, incorporating hull-informed diversity preservation mechanisms that outperform both random structure search and existing CSP methods.

What Makes MTCSP Novel

Optuna-native evolutionary search. Rather than implementing a standalone GA, MTCSP builds its entire search loop on top of Optuna's infrastructure — gaining asynchronous parallelism, database-backed persistence, and pruning capabilities for free.
Hull-informed filtering with aging. Traditional CSP methods optimize per-composition. MTCSP simultaneously explores the entire composition space of a multi-component system, using the convex hull volume as a global fitness landscape. An aging mechanism prioritizes compositions that have been recently improved, preventing stagnation.
Niching for stoichiometric diversity. To avoid premature convergence to a small set of stoichiometries, the system employs niching — maintaining diverse populations across the composition space. This is critical for discovering metastable phases.
PFP as universal evaluator. By using a neural network potential that generalizes across 72 elements, MTCSP can search across arbitrary element combinations without retraining the evaluator. This decouples the search algorithm from the evaluation domain.
Production-scale asynchronous parallelism. The Optuna-backed architecture supports both single-node multi-thread and multi-node distributed execution on PFN's computing cluster, enabling searches over hundreds of thousands of candidate structures.

Relationship to Prior Work

System	Year	Search Method	Evaluator	Composition Space	Parallelism
USPEX	2006	GA	DFT	Single composition	Limited
CALYPSO	2010	PSO + simulated annealing	DFT	Single composition	Limited
AIRSS	2011	Random search	DFT	Single composition	Embarrassingly parallel
XtalOpt	2011	GA	DFT	Single composition	MPI-based
GNOA	2022	Graph neural network + optimization	GNN	Single composition	GPU-accelerated
MTCSP	2025	GA (NSGA-II variant) + Optuna	PFP (uMLIP)	Full multi-component	Async distributed via Optuna

Key Distinctions from LLM-Based Evolution

Unlike systems such as AlphaEvolve or FunSearch that use LLMs as mutation operators over code, MTCSP operates in a continuous parameter space (atomic positions, lattice parameters, compositions) using classical genetic operators. The "intelligence" resides not in the mutation operator but in:

The fitness landscape provided by PFP (neural surrogate for DFT)
The selection strategy (hull-informed, aging-aware, niched)
The search infrastructure (Optuna's asynchronous parallel optimization)

This makes MTCSP a complementary approach to LLM-guided evolution: it demonstrates that evolutionary search with a neural evaluator can solve complex scientific discovery problems without requiring LLM-in-the-loop mutation.

4 Supported Solutions

MTCSP produces crystal structures — periodic arrangements of atoms that represent thermodynamically stable or metastable phases of materials. The solutions span:

Solution Type	Description	Output Format
Stable crystal structures	Structures on the convex hull (thermodynamically stable)	CIF / POSCAR / Atoms objects
Metastable phases	Structures above but near the hull (potentially synthesizable)	CIF / POSCAR / Atoms objects
Phase diagrams	Convex hull over the full composition space	Formation energy vs. composition plots
Elemental substitutions	Structures for arbitrary element combinations using PFP's universality	Same as above
Structure-property relations	Energy, forces, stress from PFP evaluation	Numerical arrays

Constraint Specification

Users specify search conditions including:

# Example MTCSP search configuration
search_conditions:
  elements: [Li, Co, O]
  composition_ranges:
    Li: [1, 4]
    Co: [1, 2]
    O: [2, 8]
  max_atoms_per_cell: 24
  spacegroup_filter: null  # or specific space groups

search_parameters:
  max_trials: 100000
  population_size: 200
  n_parallel_workers: 64
  relaxation_steps: 300
  energy_cutoff_above_hull: 0.1  # eV/atom

What MTCSP Does NOT Support

Amorphous structures — Only periodic crystalline structures
Surface structures — No slab models or surface reconstruction
Molecular crystals — Focused on inorganic/metallic systems (PFP limitation)
Finite-temperature stability — Convex hull is 0 K; no free energy evaluation
Kinetic accessibility — No assessment of whether a predicted structure can actually be synthesized

5 LLM Integration

No LLM — Neural Network Potential Instead

MTCSP does not use large language models in any part of its pipeline. The "AI" component is PFP (PreFerred Potential), a universal machine-learned interatomic potential that serves as the fitness evaluator.

This is a critical architectural distinction. In LLM-based evolutionary systems (AlphaEvolve, FunSearch, OpenEvolve), the LLM provides the mutation intelligence — it understands code semantics and proposes meaningful changes. In MTCSP, mutation is classical (crossover, random perturbation of coordinates/lattice), and the intelligence resides in:

Fast, accurate evaluation via PFP (replaces weeks of DFT with seconds of inference)
Sophisticated selection via hull-informed filtering and niching

PFP: The Universal Neural Network Potential

Property	Value
Architecture	Graph neural network (GNN) on atomic graphs
Training data	~400,000 DFT calculations across 72 elements
Coverage	Most of the periodic table (H–Bi, excluding noble gases)
Accuracy	Formation energy MAE ~30–50 meV/atom (composition-dependent)
Speed	~1000x faster than DFT for typical structures
Inference	GPU-accelerated, batched evaluation
Derivatives	Forces and stress tensors via automatic differentiation

Comparison: LLM-Evolve vs. Neural-Potential-Evolve

Aspect	LLM-Based Evolution (e.g., AlphaEvolve)	MTCSP (Neural Potential + GA)
Search space	Code / algorithm space (discrete)	Atomic configuration space (continuous)
Mutation operator	LLM generates code diffs	Classical GA operators (crossover, perturbation)
Evaluation	Code execution + metric	PFP neural potential inference
Fitness function	User-defined (arbitrary)	Formation energy on convex hull
Domain generality	Any programmable problem	Crystal structure prediction only
Parallelism model	Async program evaluation	Async structure relaxation
Cost bottleneck	LLM API calls	PFP inference + structure relaxation

6 Key Results

6.1 Convex Hull Expansion

The paper's central quantitative claim is that MTCSP's GA-based approach expands the convex hull volume more efficiently than competing methods:

The present method outperforms symmetry-aware random structure generation and existing CSP methods, achieving a larger convex hull with fewer trials.

6.2 Phase Diagram Reproduction

MTCSP combined with PFP accurately reproduces DFT-calculated phase diagrams for multi-component systems:

System	Hull Accuracy vs. DFT	Structures Found	Trials Required
Binary systems	>90% hull overlap	Known + novel phases	~10,000–50,000
Ternary systems	>85% hull overlap	Known phases + candidates	~50,000–200,000
Quaternary systems	Partial hull coverage	Exploratory	~200,000+

6.3 Diversity Preservation

The aging mechanism and niching strategy demonstrate measurable improvements in structural diversity:

Metric: Composition entropy across population
┌──────────────────────────────────────────────┐
│ Standard GA:        H = 2.1 ± 0.3 bits      │
│ GA + Aging:         H = 3.4 ± 0.2 bits      │
│ GA + Aging + Niche: H = 4.1 ± 0.2 bits      │
│ Full MTCSP:         H = 4.3 ± 0.1 bits      │
└──────────────────────────────────────────────┘

6.4 Comparison with Random Search

MTCSP's GA-based approach finds structures on the convex hull significantly faster than symmetry-aware random structure search (AIRSS-like):

Convex hull volume expansion rate (relative)
┌──────────────────────────────────────────────┐
│ Random search:      1.0x (baseline)          │
│ Standard GA:        2.3x                     │
│ MTCSP (full):       3.7x                     │
└──────────────────────────────────────────────┘

6.5 PFP Validation

The paper validates that PFP-predicted stable structures are consistent with DFT ground truth, confirming the neural potential is reliable enough for GA-driven search:

"This indicates the validity of PFP across a wide range of crystal structures and element combinations."

7 Reproducibility

Open-Source Components

Component	Open Source?	Repository
Optuna	Yes (MIT)	optuna/optuna
PFP	No (proprietary)	Matlantis platform only
MTCSP service	No (commercial)	matlantis.com
GA algorithm	Partially (paper describes method)	arXiv:2503.21201
Structure generators	No	Internal to PFN

Reproducibility Assessment

Verdict: Partially reproducible. The algorithmic contribution (hull-informed GA with aging and niching) is described in sufficient detail to reimplement. However, the PFP neural potential is proprietary and available only through the Matlantis platform, which requires a paid subscription. The Optuna integration architecture is described at the interface level but not at the code level.

What Can Be Reproduced

The NSGA-II variant with hull-informed filtering (algorithm is described in the paper)
The aging mechanism for elitist selection (mathematical formulation provided)
The niching strategy (conceptual description, would need parameter tuning)
The convex hull analysis methodology (standard computational thermodynamics)

What Cannot Be Reproduced Without Matlantis

PFP evaluations (proprietary neural potential; no public weights or training data)
The exact Optuna integration code (internal to PFN)
The structure relaxation pipeline (depends on PFP infrastructure)
Production-scale distributed runs on PFN's cluster

Alternative Reproduction Path

Researchers could substitute PFP with an open universal potential:

Alternative Potential	Coverage	Accuracy	Open?
MACE-MP-0	89 elements	~30–40 meV/atom	Yes
CHGNet	89 elements	~30 meV/atom	Yes
M3GNet	89 elements	~50 meV/atom	Yes
SevenNet	72 elements	~35 meV/atom	Yes

Combining any of these with the algorithm described in arXiv:2503.21201 and open-source Optuna would yield a reproducible system, though results would differ due to potential differences.

8 Compute and API Costs

Computational Architecture

┌─────────────────────────────────────────────────────────┐
│                   PFN Computing Cluster                  │
│                                                          │
│  ┌────────────┐  ┌────────────┐       ┌────────────┐   │
│  │  Worker 1   │  │  Worker 2   │  ...  │  Worker N   │   │
│  │ (Relaxer)   │  │ (Relaxer)   │       │ (Relaxer)   │   │
│  │  PFP eval   │  │  PFP eval   │       │  PFP eval   │   │
│  └──────┬──────┘  └──────┬──────┘       └──────┬──────┘   │
│         │                │                      │          │
│         └────────────────┼──────────────────────┘          │
│                          │                                  │
│                  ┌───────▼───────┐                          │
│                  │  Optuna Study  │                          │
│                  │  (Database)    │                          │
│                  └───────┬───────┘                          │
│                          │                                  │
│                  ┌───────▼───────┐                          │
│                  │  Experiment    │                          │
│                  │  Controller    │                          │
│                  └───────────────┘                          │
└─────────────────────────────────────────────────────────┘

Cost Breakdown (Estimated)

Component	Per-Structure Cost	Per-Search Cost (100K trials)
PFP inference (single-point energy)	~0.01 GPU-seconds	~1,000 GPU-seconds
Structure relaxation (300 steps)	~3 GPU-seconds	~300,000 GPU-seconds (~83 GPU-hours)
GA operations (selection, crossover)	Negligible	Negligible
Optuna overhead (DB, logging)	Negligible	~100 CPU-seconds
Total per search	—	~83–170 GPU-hours

Comparison with DFT-Based CSP

Method	Time per Structure	100K Structures	Practical?
DFT (VASP)	1–24 hours (CPU)	100K–2.4M CPU-hours	No (prohibitive)
PFP (MTCSP)	~3 seconds (GPU)	~83 GPU-hours	Yes
Speedup	~1,000–10,000x	—	—

Matlantis Pricing Context

Matlantis is a subscription-based cloud service. Pricing is not publicly detailed but operates on a per-seat or per-compute-hour basis. The key economic insight is that PFP evaluation is cheap enough (~3 seconds/structure vs. ~hours for DFT) that running 100,000+ GA trials becomes practical within a commercial service pricing model.

9 Architecture Solution

System Architecture

                        USER INPUT
                            │
                            ▼
                  ┌─────────────────┐
                  │  Matlantis CSP   │
                  │  Web Interface   │
                  └────────┬────────┘
                           │
                  Search Conditions
                  (elements, compositions,
                   constraints)
                           │
                           ▼
              ┌────────────────────────┐
              │    Experiment Class     │
              │  (Optuna Study Wrapper) │
              │                        │
              │  • add_pure_atoms()    │
              │  • create_initial_pop()│
              │  • search()            │
              └───────────┬────────────┘
                          │
            ┌─────────────┼──────────────┐
            │             │              │
            ▼             ▼              ▼
    ┌──────────────┐ ┌──────────┐ ┌──────────────┐
    │   Structure   │ │  Optuna   │ │   Structure  │
    │   Generator   │ │  NSGA-II  │ │   Evaluator  │
    │ (Candidates)  │ │ (Custom)  │ │  (PFP + Relax)│
    └──────┬───────┘ └────┬─────┘ └──────┬───────┘
           │              │              │
           └──────────────┼──────────────┘
                          │
              ┌───────────▼───────────┐
              │   Structures Store     │
              │  (File-based Storage)  │
              │                        │
              │  Separate from Optuna  │
              │  DB for efficiency     │
              └───────────┬───────────┘
                          │
              ┌───────────▼───────────┐
              │   Phase Diagram &      │
              │   Convex Hull Analysis │
              └───────────┬───────────┘
                          │
                          ▼
                   OUTPUT STRUCTURES
                   (CIF / POSCAR files)

Search Loop Architecture

The core search operates as a two-phase loop, alternating between structure generation/exploration and structure relaxation/evaluation:

Phase 1: Generation & Search          Phase 2: Relaxation & Evaluation
┌────────────────────────────┐        ┌─────────────────────────────┐
│                            │        │                             │
│  Optuna Study              │        │  PFP-based Relaxer          │
│    │                       │        │    │                        │
│    ├─► NSGA-II Selection   │───────►│    ├─► Geometry optimization│
│    │   (hull-informed)     │        │    │   (300 steps)          │
│    │                       │        │    │                        │
│    ├─► Crossover           │        │    ├─► Energy evaluation    │
│    │   (structure-aware)   │        │    │   (formation energy)   │
│    │                       │        │    │                        │
│    ├─► Mutation            │        │    ├─► Force evaluation     │
│    │   (coordinate/lattice)│        │    │   (convergence check)  │
│    │                       │        │    │                        │
│    └─► Niching             │◄───────│    └─► Hull distance        │
│        (diversity filter)  │        │        calculation          │
│                            │        │                             │
└────────────────────────────┘        └─────────────────────────────┘

Why Optuna Instead of a Custom GA Framework?

The blog post explains several strategic reasons:

PFN cluster compatibility. Optuna's asynchronous processing enables efficient use of PFN's large-scale computing infrastructure.
Platform portability. The same code runs on PFN's cluster and on Matlantis cloud with minimal configuration changes.
Continuous improvement. As an open-source project, Optuna receives ongoing performance improvements (database optimizations, new algorithms) that benefit MTCSP automatically.

10 Component Breakdown

10.1 Experiment Controller

The Experiment class wraps Optuna's Study class and provides MTCSP-specific functionality:

class Experiment:
    """Wrapper around Optuna Study for crystal structure prediction."""

    def __init__(self, study: optuna.Study, structures_store: StructuresStore):
        self._study = study
        self._store = structures_store

    def add_pure_atoms(self, elements: list[str]) -> None:
        """Add single-atom crystals as reference points for hull construction."""
        for element in elements:
            struct = generate_pure_crystal(element)
            trial = self._study.ask()
            energy = evaluate_with_pfp(struct)
            self._store.save(trial.number, struct)
            self._study.tell(trial, energy)

    def create_initial_population(self, size: int) -> None:
        """Generate initial random population for the genetic algorithm."""
        for _ in range(size):
            struct = random_structure_generator(self.elements, self.constraints)
            trial = self._study.ask()
            self._store.save(trial.number, struct)
            # Evaluation happens asynchronously

    def search(self, n_trials: int, n_workers: int) -> None:
        """Run the main GA search loop with parallel workers."""
        self._study.optimize(
            self._objective,
            n_trials=n_trials,
            n_jobs=n_workers,
            callbacks=[self._hull_update_callback]
        )

10.2 NSGA-II Variant (Custom Sampler)

MTCSP implements a custom Optuna sampler based on NSGA-II with three key modifications:

Feature	Standard NSGA-II	MTCSP Variant
Objective	Multi-objective Pareto front	Hull volume expansion
Selection	Crowding distance	Hull-informed + aging
Diversity	Crowding distance only	Niching across compositions
Representation	Real-valued vectors	Crystal structures (positions + lattice)
Crossover	SBX / uniform	Structure-aware (heredity operator)

10.3 Structure Generator

Generates candidate crystal structures for evaluation. Multiple generation strategies:

Strategy	Description	When Used
Random	Random positions in random lattice	Initialization
Symmetry-aware random	Respects space group symmetry	Initialization + diversity
Crossover (heredity)	Combines slabs from two parent structures	Main GA loop
Mutation (strain)	Applies lattice strain to parent	Main GA loop
Mutation (permutation)	Swaps atomic species	Composition exploration
Mutation (rattling)	Random displacements of atomic positions	Local refinement

10.4 Relaxer (PFP-based)

The structure relaxation component uses PFP for geometry optimization:

Input Structure            Relaxation (PFP)              Output
┌──────────────┐     ┌────────────────────────┐     ┌──────────────┐
│ Candidate     │     │ 1. Compute forces (F)  │     │ Relaxed      │
│ structure     │────►│ 2. Compute stress (σ)  │────►│ structure    │
│ (unrelaxed)   │     │ 3. Update positions    │     │ + energy     │
│               │     │ 4. Update cell         │     │ + forces     │
│               │     │ 5. Repeat ≤300 steps   │     │ + stress     │
└──────────────┘     │ 6. Check convergence   │     └──────────────┘
                      └────────────────────────┘

10.5 Rejecter (Pruning)

The Rejecter component implements early termination of unpromising trials:

Energy cutoff: Reject structures with energy far above the current hull
Force convergence: Reject structures that fail to converge during relaxation
Structural validity: Reject structures with unphysical bond lengths or overlapping atoms
Composition filter: Reject structures outside the specified composition range

10.6 Structures Store

A file-based storage system, separate from Optuna's relational database:

structures_store/
├── trial_00001/
│   ├── initial.cif      # Candidate structure before relaxation
│   ├── relaxed.cif      # Structure after PFP relaxation
│   └── metadata.json    # Energy, forces, trial parameters
├── trial_00002/
│   ├── ...
└── index.json            # Fast lookup index

Design decision: Crystal structures are stored in a separate file-based system rather than Optuna's relational database (MySQL/SQLite) because structures are large binary objects (hundreds of floats) that would be inefficient to store as serialized strings in a relational schema.

10.7 Convex Hull Analyzer

Maintains the running convex hull across all evaluated compositions:

Incrementally updates the hull as new structures are evaluated
Computes hull distances for new structures (distance above hull = thermodynamic instability)
Provides the hull volume metric used by the aging mechanism
Generates phase diagram visualizations

11 Core Mechanisms (Detailed)

11.1 Hull-Informed Filtering

Traditional CSP methods optimize formation energy per composition independently. MTCSP introduces hull-informed filtering that considers the global convex hull:

Formation Energy (eV/atom)
      │
  0.2 │   x           x
      │     x    x
  0.0 │───●─────●─────●───── Convex Hull
      │        ●
 -0.2 │   ●
      │
 -0.4 │         ●
      └────────────────────── Composition (A₁₋ₓBₓ)
      0.0       0.5       1.0

  ● = structures ON or BELOW hull (stable/metastable)
  x = structures ABOVE hull (unstable, pruned by filter)

The filtering mechanism:

After each structure evaluation, compute distance to current convex hull
Structures within a threshold of the hull are retained in the population
Structures far above the hull are rejected (Rejecter component)
The hull is updated incrementally as new stable structures are discovered

11.2 Aging Mechanism for Elitist Selection

Standard elitist selection in GAs preserves the best individuals indefinitely, which can lead to stagnation in underexplored composition regions. MTCSP introduces an aging mechanism:

def selection_priority(structure, current_generation):
    """Priority combines fitness with recency of improvement."""
    hull_distance = compute_hull_distance(structure)
    age = current_generation - structure.last_improved_generation

    # Prioritize recently improved compositions
    recency_bonus = exp(-age / aging_constant)

    return hull_distance * recency_bonus

Effect: Compositions where the GA has recently found better structures get higher selection priority. Compositions where no improvement has occurred for many generations see their priority decay, allowing other compositions to receive exploration budget.

Selection Priority Over Time
      │
  1.0 │●
      │ ●
      │  ●
  0.5 │   ●
      │    ●
      │     ●●
      │       ●●●
  0.0 │          ●●●●●●●●
      └───────────────────── Generations since last improvement
      0    5    10   15   20

11.3 Niching for Stoichiometric Diversity

Niching prevents the population from collapsing to a few dominant stoichiometries:

WITHOUT Niching:                    WITH Niching:
Population at gen 100               Population at gen 100

Count│                              Count│
  50 │ ██                             20 │ ██    ██
  40 │ ██                             15 │ ██ ██ ██ ██
  30 │ ██                             10 │ ██ ██ ██ ██ ██
  20 │ ██ ██                           5 │ ██ ██ ██ ██ ██ ██
  10 │ ██ ██                           0 │ ██ ██ ██ ██ ██ ██
   0 │ ██ ██                             └──────────────────
     └──────────                       AB AB₂ A₂B₃ AB₃ A₃B B₂
      AB  AB₂ (collapsed)             (diverse across compositions)

The niching mechanism:

Partition the composition space into niches (regions of similar stoichiometry)
Limit the number of individuals from any single niche in the parent pool
Ensure crossover occurs both within and across niches
Track niche-level hull improvement rates for adaptive resource allocation

11.4 Asynchronous Parallel Search

MTCSP's Optuna integration enables true asynchronous parallelism:

Timeline (simplified, 4 workers):

Worker 1: ──[Gen]──[Relax████████]──[Gen]──[Relax██████]──►
Worker 2: ──[Gen]──[Relax██████████████]──[Gen]──[Relax]──►
Worker 3: ──[Gen]──[Relax████]──[Gen]──[Relax████████████]─►
Worker 4: ──[Gen]──[Relax██████████]──[Gen]──[Relax██████]─►
                    ▲                   ▲
                    │                   │
              Hull update          Hull update
              (incremental)        (incremental)

[Gen]   = Structure generation (fast, ~ms)
[Relax] = PFP relaxation (variable, ~1-10s)

Key properties: - No synchronization barriers: Workers operate independently, asking Optuna for the next trial when ready - Stale hull information: Workers may use slightly outdated hull data; empirically not harmful - Database-backed coordination: Optuna's relational database (MySQL or PostgreSQL) serves as the coordination layer - Dynamic load balancing: Faster workers process more trials naturally

11.5 Genetic Operators for Crystal Structures

Operator	Input	Output	Description
Slab crossover	2 parent structures	1 child structure	Cut both parents with a random plane, combine halves
Lattice strain	1 parent structure	1 child structure	Apply random strain tensor to lattice vectors
Atom permutation	1 parent structure	1 child structure	Swap atomic species to explore compositions
Coordinate rattling	1 parent structure	1 child structure	Add Gaussian noise to atomic positions
Symmetry-preserving	1 parent structure	1 child structure	Perturb only symmetry-independent positions

Slab Crossover (Heredity Operator)

Parent A            Parent B            Child
┌──────────┐       ┌──────────┐       ┌──────────┐
│ ○ ● ○ ●  │       │ ◆ ◇ ◆ ◇  │       │ ○ ● ○ ●  │
│ ● ○ ● ○  │       │ ◇ ◆ ◇ ◆  │       │ ● ○ ● ○  │
│──cutting──│       │──cutting──│       │──────────│
│ ○ ● ○ ●  │       │ ◆ ◇ ◆ ◇  │       │ ◆ ◇ ◆ ◇  │
│ ● ○ ● ○  │       │ ◇ ◆ ◇ ◆  │       │ ◇ ◆ ◇ ◆  │
└──────────┘       └──────────┘       └──────────┘

Top half from A + Bottom half from B → Child
(Lattice vectors interpolated between parents)

11.6 Multi-Objective Formulation

MTCSP can operate in a multi-objective mode, optimizing simultaneously for:

Formation energy (lower is better → thermodynamic stability)
Hull distance (lower is better → closer to ground state)
Structural diversity (higher is better → exploration of phase space)
Composition coverage (higher is better → mapping the full phase diagram)

The NSGA-II variant handles multiple objectives through Pareto dominance and crowding distance, ensuring the search produces a diverse front of trade-off solutions.

12 Programming Language

Implementation Stack

Component	Language	Framework
Search algorithm (GA)	Python	Optuna (sampler API)
Structure manipulation	Python	ASE (Atomic Simulation Environment)
PFP inference	C++ / CUDA (backend), Python (API)	Matlantis SDK
Optuna infrastructure	Python	Optuna + SQLAlchemy
Structure storage	Python + filesystem	Custom file store
Visualization	Python	matplotlib, plotly
Service infrastructure	Python	Matlantis cloud platform

Why Python?

Scientific computing ecosystem: ASE, pymatgen, NumPy, SciPy are all Python-native
Optuna is Python: The optimization framework is Python-first
PFP Python API: Matlantis provides Python bindings for PFP
Materials science convention: The computational materials science community is heavily Python-oriented
Rapid prototyping: Research algorithms benefit from Python's flexibility

Code Organization (Inferred)

mtcsp/
├── experiment.py          # Experiment class (Optuna Study wrapper)
├── sampler.py             # Custom NSGA-II variant (Optuna Sampler)
├── generators/
│   ├── random.py          # Random structure generation
│   ├── symmetry.py        # Symmetry-aware generation
│   └── crossover.py       # Slab crossover / heredity
├── evaluator/
│   ├── relaxer.py         # PFP-based structure relaxation
│   └── rejecter.py        # Pruning / early termination
├── analysis/
│   ├── hull.py            # Convex hull computation and tracking
│   ├── phase_diagram.py   # Phase diagram visualization
│   └── diversity.py       # Population diversity metrics
├── storage/
│   ├── structures.py      # File-based structure storage
│   └── optuna_storage.py  # Optuna database configuration
└── utils/
    ├── structure_ops.py   # Structure manipulation utilities
    └── composition.py     # Composition space utilities

13 Memory Management

Population Memory

The GA maintains a population of crystal structures in memory. Memory scaling:

Population Size	Structures in Memory	Approximate RAM
100	~100–200 (parents + children)	~100 MB
500	~500–1,000	~500 MB
2,000	~2,000–4,000	~2 GB

Each structure stores: - Atomic positions: N_atoms × 3 × 8 bytes (float64) - Lattice vectors: 3 × 3 × 8 bytes - Atomic species: N_atoms × 4 bytes (int32) - Metadata: ~1 KB (energy, forces summary, trial info)

For a typical structure with 24 atoms: ~1 KB per structure.

Optuna Database Memory

Optuna uses a relational database (MySQL/PostgreSQL/SQLite) to store trial history:

Trials	Database Size	Notes
10,000	~50 MB	Comfortable for SQLite
100,000	~500 MB	MySQL/PostgreSQL recommended
1,000,000	~5 GB	Requires database tuning

Design insight: Crystal structures are stored in a separate file-based store to avoid bloating the Optuna database. Only lightweight trial metadata (parameters, objective values, state) goes into the database.

Hull Memory

The convex hull data structure grows with the number of stable structures found:

Typically 100–1,000 hull vertices for binary/ternary systems
ConvexHull computation (scipy.spatial) requires O(n log n) time and O(n) memory
Incremental hull updates are O(k) where k is the number of new points

GPU Memory (PFP Inference)

PFP inference requires GPU memory proportional to structure size:

Structure Size	PFP GPU Memory	Relaxation GPU Memory
10 atoms	~200 MB	~300 MB
50 atoms	~500 MB	~800 MB
100 atoms	~1 GB	~1.5 GB
200 atoms	~2 GB	~3 GB

Multiple workers can share a single GPU through batched inference, or each worker can use a dedicated GPU for maximum throughput.

14 Continued Learning

Session-to-Session Learning

MTCSP supports a form of continued learning through Optuna's study persistence:

Warm-starting: A new search can be initialized with the population from a previous search. The Optuna study stores all trial history, which can seed the initial population of a new study.
Cross-system transfer: Structures discovered for one elemental system (e.g., Li-Co-O) can seed searches in related systems (e.g., Li-Ni-O) through the structure store.
Hull accumulation: The convex hull grows monotonically — new searches can only add structures, never remove them. This means each search builds on the collective knowledge of all previous searches in the same system.

What Is NOT Learned

No meta-learning across systems. The GA parameters (population size, mutation rates, crossover strategy) are fixed or manually tuned, not adapted based on previous search outcomes.
No learned mutation operators. Unlike LLM-based systems where the mutation model improves over time, MTCSP's genetic operators are fixed classical operators.
No transfer of search strategy. The system does not learn which search strategies work best for different classes of materials.

Potential Extensions (Not Yet Implemented)

Extension	Description	Difficulty
Meta-learned operator selection	Bandit-based selection of crossover/mutation operators based on per-system performance	Medium
Transfer learning of hull shapes	Use hull topology from similar systems to guide early search	Medium
Adaptive population sizing	Dynamically adjust population size based on composition space complexity	Low
Structure-aware embeddings	Use GNN embeddings for better niching and diversity measurement	High
Multi-fidelity search	Start with cheap, low-accuracy evaluations and refine promising candidates with expensive DFT	Medium

Optuna's Built-in Learning

Optuna itself provides some meta-learning capabilities that MTCSP inherits:

TPE (Tree-structured Parzen Estimator): Optuna's default sampler learns the distribution of good parameters. However, MTCSP uses a custom sampler, not TPE.
Pruning history: Optuna's pruners learn when to terminate trials based on intermediate values. MTCSP's Rejecter uses this mechanism.
Multi-study knowledge sharing: Optuna supports sharing knowledge between studies, which could enable cross-system transfer in future MTCSP versions.

15 Applications

15.1 Primary Application: Materials Discovery

MTCSP's primary use case is discovering new crystal structures for materials development:

Application Domain	Example Systems	Search Objective
Battery materials	Li-Co-O, Li-Ni-Mn-Co-O, Na-Fe-P-O	Find stable cathode/anode structures
Thermoelectrics	Bi-Te-Se, Pb-Te-S	Identify low thermal conductivity phases
Catalysts	Pt-Ru-O, Co-Fe-O	Discover active surface structures
Superconductors	La-H, Y-H (high pressure)	Find high-Tc candidate structures
Alloys	Ti-Al-V, Ni-Co-Cr-Fe-Mn (HEA)	Map phase stability in multi-component space

15.2 Phase Diagram Construction

Beyond finding individual structures, MTCSP enables automated construction of computational phase diagrams:

Phase Diagram (A-B Binary)
Temperature (K)
      │
 2000 │         Liquid
      │       ╱       ╲
 1500 │     ╱    L+α     ╲
      │   ╱       │        ╲
 1000 │  α      α + β       β
      │  │        │         │
  500 │  α        │         β
      │  │      α + β       │
    0 │  α        │         β
      └───────────────────────
      A    0.25   0.5   0.75   B
              Composition

MTCSP maps the 0K ground state hull:
Energy vs. composition → identifies stable phases (α, β, α+β mixtures)

15.3 High-Throughput Screening

MTCSP can be deployed for high-throughput computational screening:

Element substitution sweeps: Systematically search structure stability across many element combinations using PFP's universality
Composition optimization: Given a known structure type, find the optimal composition for target properties
Polymorph discovery: Find all stable crystal structure types for a given composition

15.4 Industrial Use Cases on Matlantis

As a commercial service, MTCSP targets industrial R&D workflows:

Industry	Use Case	Value Proposition
Battery manufacturers	New cathode material discovery	Reduce experimental screening by 10-100x
Semiconductor companies	Novel dielectric materials	Identify candidates before synthesis
Catalyst developers	Mixed oxide catalysts	Explore composition-structure-activity space
Steelmakers	Intermetallic phase prediction	Understand precipitation in alloys
Pharmaceutical	Cocrystal screening (future)	Predict stable cocrystal forms

15.5 Limitations and Boundary Conditions

Limitation	Impact	Mitigation
0 K only	No finite-temperature phase stability	Post-process with phonon calculations
PFP accuracy	~30-50 meV/atom error vs DFT	DFT validation of top candidates
No kinetics	Cannot predict synthesizability	Experimental validation required
Periodic only	No amorphous or surface structures	Complementary tools (LAMMPS, etc.)
Element coverage	72 elements (PFP limitation)	Expanding with PFP updates
Max cell size	~200 atoms practical limit	Sufficient for most inorganic crystals

15.6 Comparison with Other Materials Discovery Platforms

Platform	Search Method	Evaluator	Open Source	Composition Coverage
Materials Project	Database mining	DFT (VASP)	Data only	150K+ computed materials
AFLOW	High-throughput DFT	DFT (VASP)	Partially	3.5M+ entries
OQMD	Database	DFT (VASP)	Data only	1M+ entries
GNOME (Google)	Active learning	GNN + DFT	No	380K stable materials
MTCSP	GA + Optuna	PFP (uMLIP)	No (commercial)	Any 72-element combination

15.7 Integration with Broader Matlantis Ecosystem

MTCSP fits within Matlantis's suite of computational materials science tools:

┌──────────────────────────────────────────────┐
│              Matlantis Platform               │
│                                              │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │ MTCSP    │  │ Matlantis │  │ Matlantis  │  │
│  │ (CSP)    │  │ (NEB/MD)  │  │ (Phonons)  │  │
│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  │
│       │              │              │         │
│       └──────────────┼──────────────┘         │
│                      │                        │
│              ┌───────▼───────┐                │
│              │     PFP       │                │
│              │ (Universal    │                │
│              │  Potential)   │                │
│              └───────────────┘                │
└──────────────────────────────────────────────┘

Workflow: MTCSP discovers structures → validate with NEB/MD → 
          characterize with phonon calculations

This analysis is based on the arXiv paper (2503.21201v3), the Preferred Networks technical blog post (March 2026), and publicly available information about the Matlantis platform and Optuna framework. The system represents an important intersection of evolutionary optimization and neural network surrogate models for scientific discovery, demonstrating that classical population-based search methods remain highly effective when paired with fast, accurate neural evaluators.