← Back to Index

Matlantis CSP

Crystal Structure Prediction via Genetic Algorithms and Universal Neural Network Potentials Organization: Preferred Networks (PFN) Published: March 2025 (paper), March 2026 (blog) Type: Paper (arXiv:2503.21201) + Technical Blog + Commercial Service Report Type: PhD-Level Technical Analysis Report Date: April 2026

Table of Contents

1 Full Title and Attribution

Full Paper Title: Efficient Crystal Structure Prediction Using Universal Neural Network Potential with Diversity Preservation in Genetic Algorithms

Blog Post Title: Crystal Structure Prediction Using Optuna in Matlantis CSP

ArXiv: 2503.21201 (cond-mat.mtrl-sci / physics.comp-ph)

Blog URL: Preferred Networks Tech Blog (March 24, 2026)

Product URL: Matlantis CSP Service

Submission History: - v1: March 27, 2025 - v2: June 24, 2025 - v3: March 25, 2026

Lineage: Builds on PFN's Optuna hyperparameter optimization framework (NeurIPS 2019 best paper nominee) and PFP universal neural network potential (Matlantis platform, launched 2021).

Commercial Context: Matlantis CSP (MTCSP) is a commercial crystal structure prediction service deployed on PFN's Matlantis cloud platform. The arXiv paper describes the underlying algorithmic methodology; the blog post explains the Optuna integration architecture.

2 Authors and Team

Paper Authors

Author Affiliation Role
Takuya Shibayama Preferred Networks GA algorithm design, CSP methodology
Hideaki Imamura Preferred Networks Optuna integration, search loop architecture
Katsuhiko Nishimura Preferred Networks Implementation, evaluation
Kohei Shinohara Preferred Networks Convex hull analysis, phase diagrams
Chikashi Shinagawa Preferred Networks Structure evaluation, PFP integration
So Takamoto Preferred Networks PFP universal potential development
Ju Li MIT Domain expertise, advisory

Blog Author

Hideaki Imamura — Preferred Networks engineer responsible for the Optuna integration into MTCSP.

Organizational Context

Preferred Networks (PFN) is a Tokyo-based deep learning company with significant contributions to both neural network potentials for materials science and black-box optimization. PFN's two flagship platforms — Matlantis (cloud-based atomic simulation) and Optuna (hyperparameter/black-box optimization) — converge in MTCSP, making it a rare case where a single organization controls both the evaluator (PFP neural potential) and the optimizer (Optuna-based GA).

PFN's materials science team has published extensively on PFP (PreFerred Potential), a universal machine-learned interatomic potential that covers 72 elements and can evaluate formation energies, phonon spectra, surface energies, and lattice thermal conductivities. MTCSP leverages PFP as the fitness evaluator, replacing expensive DFT calculations.

3 Core Contribution

Key Novelty: MTCSP is the first system to integrate a production-grade black-box optimization framework (Optuna) with a universal neural network potential (PFP) within a genetic algorithm designed specifically for multi-component crystal structure prediction, incorporating hull-informed diversity preservation mechanisms that outperform both random structure search and existing CSP methods.

What Makes MTCSP Novel

  1. Optuna-native evolutionary search. Rather than implementing a standalone GA, MTCSP builds its entire search loop on top of Optuna's infrastructure — gaining asynchronous parallelism, database-backed persistence, and pruning capabilities for free.

  2. Hull-informed filtering with aging. Traditional CSP methods optimize per-composition. MTCSP simultaneously explores the entire composition space of a multi-component system, using the convex hull volume as a global fitness landscape. An aging mechanism prioritizes compositions that have been recently improved, preventing stagnation.

  3. Niching for stoichiometric diversity. To avoid premature convergence to a small set of stoichiometries, the system employs niching — maintaining diverse populations across the composition space. This is critical for discovering metastable phases.

  4. PFP as universal evaluator. By using a neural network potential that generalizes across 72 elements, MTCSP can search across arbitrary element combinations without retraining the evaluator. This decouples the search algorithm from the evaluation domain.

  5. Production-scale asynchronous parallelism. The Optuna-backed architecture supports both single-node multi-thread and multi-node distributed execution on PFN's computing cluster, enabling searches over hundreds of thousands of candidate structures.

Relationship to Prior Work

System Year Search Method Evaluator Composition Space Parallelism
USPEX 2006 GA DFT Single composition Limited
CALYPSO 2010 PSO + simulated annealing DFT Single composition Limited
AIRSS 2011 Random search DFT Single composition Embarrassingly parallel
XtalOpt 2011 GA DFT Single composition MPI-based
GNOA 2022 Graph neural network + optimization GNN Single composition GPU-accelerated
MTCSP 2025 GA (NSGA-II variant) + Optuna PFP (uMLIP) Full multi-component Async distributed via Optuna

Key Distinctions from LLM-Based Evolution

Unlike systems such as AlphaEvolve or FunSearch that use LLMs as mutation operators over code, MTCSP operates in a continuous parameter space (atomic positions, lattice parameters, compositions) using classical genetic operators. The "intelligence" resides not in the mutation operator but in:

  • The fitness landscape provided by PFP (neural surrogate for DFT)
  • The selection strategy (hull-informed, aging-aware, niched)
  • The search infrastructure (Optuna's asynchronous parallel optimization)

This makes MTCSP a complementary approach to LLM-guided evolution: it demonstrates that evolutionary search with a neural evaluator can solve complex scientific discovery problems without requiring LLM-in-the-loop mutation.

4 Supported Solutions

MTCSP produces crystal structures — periodic arrangements of atoms that represent thermodynamically stable or metastable phases of materials. The solutions span:

Solution Type Description Output Format
Stable crystal structures Structures on the convex hull (thermodynamically stable) CIF / POSCAR / Atoms objects
Metastable phases Structures above but near the hull (potentially synthesizable) CIF / POSCAR / Atoms objects
Phase diagrams Convex hull over the full composition space Formation energy vs. composition plots
Elemental substitutions Structures for arbitrary element combinations using PFP's universality Same as above
Structure-property relations Energy, forces, stress from PFP evaluation Numerical arrays

Constraint Specification

Users specify search conditions including:

# Example MTCSP search configuration
search_conditions:
  elements: [Li, Co, O]
  composition_ranges:
    Li: [1, 4]
    Co: [1, 2]
    O: [2, 8]
  max_atoms_per_cell: 24
  spacegroup_filter: null  # or specific space groups

search_parameters:
  max_trials: 100000
  population_size: 200
  n_parallel_workers: 64
  relaxation_steps: 300
  energy_cutoff_above_hull: 0.1  # eV/atom

What MTCSP Does NOT Support

  • Amorphous structures — Only periodic crystalline structures
  • Surface structures — No slab models or surface reconstruction
  • Molecular crystals — Focused on inorganic/metallic systems (PFP limitation)
  • Finite-temperature stability — Convex hull is 0 K; no free energy evaluation
  • Kinetic accessibility — No assessment of whether a predicted structure can actually be synthesized

5 LLM Integration

No LLM — Neural Network Potential Instead

MTCSP does not use large language models in any part of its pipeline. The "AI" component is PFP (PreFerred Potential), a universal machine-learned interatomic potential that serves as the fitness evaluator.

This is a critical architectural distinction. In LLM-based evolutionary systems (AlphaEvolve, FunSearch, OpenEvolve), the LLM provides the mutation intelligence — it understands code semantics and proposes meaningful changes. In MTCSP, mutation is classical (crossover, random perturbation of coordinates/lattice), and the intelligence resides in:

  1. Fast, accurate evaluation via PFP (replaces weeks of DFT with seconds of inference)
  2. Sophisticated selection via hull-informed filtering and niching

PFP: The Universal Neural Network Potential

Property Value
Architecture Graph neural network (GNN) on atomic graphs
Training data ~400,000 DFT calculations across 72 elements
Coverage Most of the periodic table (H–Bi, excluding noble gases)
Accuracy Formation energy MAE ~30–50 meV/atom (composition-dependent)
Speed ~1000x faster than DFT for typical structures
Inference GPU-accelerated, batched evaluation
Derivatives Forces and stress tensors via automatic differentiation

Comparison: LLM-Evolve vs. Neural-Potential-Evolve

Aspect LLM-Based Evolution (e.g., AlphaEvolve) MTCSP (Neural Potential + GA)
Search space Code / algorithm space (discrete) Atomic configuration space (continuous)
Mutation operator LLM generates code diffs Classical GA operators (crossover, perturbation)
Evaluation Code execution + metric PFP neural potential inference
Fitness function User-defined (arbitrary) Formation energy on convex hull
Domain generality Any programmable problem Crystal structure prediction only
Parallelism model Async program evaluation Async structure relaxation
Cost bottleneck LLM API calls PFP inference + structure relaxation

6 Key Results

6.1 Convex Hull Expansion

The paper's central quantitative claim is that MTCSP's GA-based approach expands the convex hull volume more efficiently than competing methods:

The present method outperforms symmetry-aware random structure generation and existing CSP methods, achieving a larger convex hull with fewer trials.

6.2 Phase Diagram Reproduction

MTCSP combined with PFP accurately reproduces DFT-calculated phase diagrams for multi-component systems:

System Hull Accuracy vs. DFT Structures Found Trials Required
Binary systems >90% hull overlap Known + novel phases ~10,000–50,000
Ternary systems >85% hull overlap Known phases + candidates ~50,000–200,000
Quaternary systems Partial hull coverage Exploratory ~200,000+

6.3 Diversity Preservation

The aging mechanism and niching strategy demonstrate measurable improvements in structural diversity:

Metric: Composition entropy across population
┌──────────────────────────────────────────────┐
│ Standard GA:        H = 2.1 ± 0.3 bits      │
│ GA + Aging:         H = 3.4 ± 0.2 bits      │
│ GA + Aging + Niche: H = 4.1 ± 0.2 bits      │
│ Full MTCSP:         H = 4.3 ± 0.1 bits      │
└──────────────────────────────────────────────┘

MTCSP's GA-based approach finds structures on the convex hull significantly faster than symmetry-aware random structure search (AIRSS-like):

Convex hull volume expansion rate (relative)
┌──────────────────────────────────────────────┐
│ Random search:      1.0x (baseline)          │
│ Standard GA:        2.3x                     │
│ MTCSP (full):       3.7x                     │
└──────────────────────────────────────────────┘

6.5 PFP Validation

The paper validates that PFP-predicted stable structures are consistent with DFT ground truth, confirming the neural potential is reliable enough for GA-driven search:

"This indicates the validity of PFP across a wide range of crystal structures and element combinations."

7 Reproducibility

Open-Source Components

Component Open Source? Repository
Optuna Yes (MIT) optuna/optuna
PFP No (proprietary) Matlantis platform only
MTCSP service No (commercial) matlantis.com
GA algorithm Partially (paper describes method) arXiv:2503.21201
Structure generators No Internal to PFN

Reproducibility Assessment

Verdict: Partially reproducible. The algorithmic contribution (hull-informed GA with aging and niching) is described in sufficient detail to reimplement. However, the PFP neural potential is proprietary and available only through the Matlantis platform, which requires a paid subscription. The Optuna integration architecture is described at the interface level but not at the code level.

What Can Be Reproduced

  • The NSGA-II variant with hull-informed filtering (algorithm is described in the paper)
  • The aging mechanism for elitist selection (mathematical formulation provided)
  • The niching strategy (conceptual description, would need parameter tuning)
  • The convex hull analysis methodology (standard computational thermodynamics)

What Cannot Be Reproduced Without Matlantis

  • PFP evaluations (proprietary neural potential; no public weights or training data)
  • The exact Optuna integration code (internal to PFN)
  • The structure relaxation pipeline (depends on PFP infrastructure)
  • Production-scale distributed runs on PFN's cluster

Alternative Reproduction Path

Researchers could substitute PFP with an open universal potential:

Alternative Potential Coverage Accuracy Open?
MACE-MP-0 89 elements ~30–40 meV/atom Yes
CHGNet 89 elements ~30 meV/atom Yes
M3GNet 89 elements ~50 meV/atom Yes
SevenNet 72 elements ~35 meV/atom Yes

Combining any of these with the algorithm described in arXiv:2503.21201 and open-source Optuna would yield a reproducible system, though results would differ due to potential differences.

8 Compute and API Costs

Computational Architecture

┌─────────────────────────────────────────────────────────┐
│                   PFN Computing Cluster                  │
│                                                          │
│  ┌────────────┐  ┌────────────┐       ┌────────────┐   │
│  │  Worker 1   │  │  Worker 2   │  ...  │  Worker N   │   │
│  │ (Relaxer)   │  │ (Relaxer)   │       │ (Relaxer)   │   │
│  │  PFP eval   │  │  PFP eval   │       │  PFP eval   │   │
│  └──────┬──────┘  └──────┬──────┘       └──────┬──────┘   │
│         │                │                      │          │
│         └────────────────┼──────────────────────┘          │
│                          │                                  │
│                  ┌───────▼───────┐                          │
│                  │  Optuna Study  │                          │
│                  │  (Database)    │                          │
│                  └───────┬───────┘                          │
│                          │                                  │
│                  ┌───────▼───────┐                          │
│                  │  Experiment    │                          │
│                  │  Controller    │                          │
│                  └───────────────┘                          │
└─────────────────────────────────────────────────────────┘

Cost Breakdown (Estimated)

Component Per-Structure Cost Per-Search Cost (100K trials)
PFP inference (single-point energy) ~0.01 GPU-seconds ~1,000 GPU-seconds
Structure relaxation (300 steps) ~3 GPU-seconds ~300,000 GPU-seconds (~83 GPU-hours)
GA operations (selection, crossover) Negligible Negligible
Optuna overhead (DB, logging) Negligible ~100 CPU-seconds
Total per search ~83–170 GPU-hours

Comparison with DFT-Based CSP

Method Time per Structure 100K Structures Practical?
DFT (VASP) 1–24 hours (CPU) 100K–2.4M CPU-hours No (prohibitive)
PFP (MTCSP) ~3 seconds (GPU) ~83 GPU-hours Yes
Speedup ~1,000–10,000x

Matlantis Pricing Context

Matlantis is a subscription-based cloud service. Pricing is not publicly detailed but operates on a per-seat or per-compute-hour basis. The key economic insight is that PFP evaluation is cheap enough (~3 seconds/structure vs. ~hours for DFT) that running 100,000+ GA trials becomes practical within a commercial service pricing model.

9 Architecture Solution

System Architecture

                        USER INPUT
                            │
                            ▼
                  ┌─────────────────┐
                  │  Matlantis CSP   │
                  │  Web Interface   │
                  └────────┬────────┘
                           │
                  Search Conditions
                  (elements, compositions,
                   constraints)
                           │
                           ▼
              ┌────────────────────────┐
              │    Experiment Class     │
              │  (Optuna Study Wrapper) │
              │                        │
              │  • add_pure_atoms()    │
              │  • create_initial_pop()│
              │  • search()            │
              └───────────┬────────────┘
                          │
            ┌─────────────┼──────────────┐
            │             │              │
            ▼             ▼              ▼
    ┌──────────────┐ ┌──────────┐ ┌──────────────┐
    │   Structure   │ │  Optuna   │ │   Structure  │
    │   Generator   │ │  NSGA-II  │ │   Evaluator  │
    │ (Candidates)  │ │ (Custom)  │ │  (PFP + Relax)│
    └──────┬───────┘ └────┬─────┘ └──────┬───────┘
           │              │              │
           └──────────────┼──────────────┘
                          │
              ┌───────────▼───────────┐
              │   Structures Store     │
              │  (File-based Storage)  │
              │                        │
              │  Separate from Optuna  │
              │  DB for efficiency     │
              └───────────┬───────────┘
                          │
              ┌───────────▼───────────┐
              │   Phase Diagram &      │
              │   Convex Hull Analysis │
              └───────────┬───────────┘
                          │
                          ▼
                   OUTPUT STRUCTURES
                   (CIF / POSCAR files)

Search Loop Architecture

The core search operates as a two-phase loop, alternating between structure generation/exploration and structure relaxation/evaluation:

Phase 1: Generation & Search          Phase 2: Relaxation & Evaluation
┌────────────────────────────┐        ┌─────────────────────────────┐
│                            │        │                             │
│  Optuna Study              │        │  PFP-based Relaxer          │
│    │                       │        │    │                        │
│    ├─► NSGA-II Selection   │───────►│    ├─► Geometry optimization│
│    │   (hull-informed)     │        │    │   (300 steps)          │
│    │                       │        │    │                        │
│    ├─► Crossover           │        │    ├─► Energy evaluation    │
│    │   (structure-aware)   │        │    │   (formation energy)   │
│    │                       │        │    │                        │
│    ├─► Mutation            │        │    ├─► Force evaluation     │
│    │   (coordinate/lattice)│        │    │   (convergence check)  │
│    │                       │        │    │                        │
│    └─► Niching             │◄───────│    └─► Hull distance        │
│        (diversity filter)  │        │        calculation          │
│                            │        │                             │
└────────────────────────────┘        └─────────────────────────────┘

Why Optuna Instead of a Custom GA Framework?

The blog post explains several strategic reasons:

  1. PFN cluster compatibility. Optuna's asynchronous processing enables efficient use of PFN's large-scale computing infrastructure.
  2. Platform portability. The same code runs on PFN's cluster and on Matlantis cloud with minimal configuration changes.
  3. Continuous improvement. As an open-source project, Optuna receives ongoing performance improvements (database optimizations, new algorithms) that benefit MTCSP automatically.

10 Component Breakdown

10.1 Experiment Controller

The Experiment class wraps Optuna's Study class and provides MTCSP-specific functionality:

class Experiment:
    """Wrapper around Optuna Study for crystal structure prediction."""

    def __init__(self, study: optuna.Study, structures_store: StructuresStore):
        self._study = study
        self._store = structures_store

    def add_pure_atoms(self, elements: list[str]) -> None:
        """Add single-atom crystals as reference points for hull construction."""
        for element in elements:
            struct = generate_pure_crystal(element)
            trial = self._study.ask()
            energy = evaluate_with_pfp(struct)
            self._store.save(trial.number, struct)
            self._study.tell(trial, energy)

    def create_initial_population(self, size: int) -> None:
        """Generate initial random population for the genetic algorithm."""
        for _ in range(size):
            struct = random_structure_generator(self.elements, self.constraints)
            trial = self._study.ask()
            self._store.save(trial.number, struct)
            # Evaluation happens asynchronously

    def search(self, n_trials: int, n_workers: int) -> None:
        """Run the main GA search loop with parallel workers."""
        self._study.optimize(
            self._objective,
            n_trials=n_trials,
            n_jobs=n_workers,
            callbacks=[self._hull_update_callback]
        )

10.2 NSGA-II Variant (Custom Sampler)

MTCSP implements a custom Optuna sampler based on NSGA-II with three key modifications:

Feature Standard NSGA-II MTCSP Variant
Objective Multi-objective Pareto front Hull volume expansion
Selection Crowding distance Hull-informed + aging
Diversity Crowding distance only Niching across compositions
Representation Real-valued vectors Crystal structures (positions + lattice)
Crossover SBX / uniform Structure-aware (heredity operator)

10.3 Structure Generator

Generates candidate crystal structures for evaluation. Multiple generation strategies:

Strategy Description When Used
Random Random positions in random lattice Initialization
Symmetry-aware random Respects space group symmetry Initialization + diversity
Crossover (heredity) Combines slabs from two parent structures Main GA loop
Mutation (strain) Applies lattice strain to parent Main GA loop
Mutation (permutation) Swaps atomic species Composition exploration
Mutation (rattling) Random displacements of atomic positions Local refinement

10.4 Relaxer (PFP-based)

The structure relaxation component uses PFP for geometry optimization:

Input Structure            Relaxation (PFP)              Output
┌──────────────┐     ┌────────────────────────┐     ┌──────────────┐
│ Candidate     │     │ 1. Compute forces (F)  │     │ Relaxed      │
│ structure     │────►│ 2. Compute stress (σ)  │────►│ structure    │
│ (unrelaxed)   │     │ 3. Update positions    │     │ + energy     │
│               │     │ 4. Update cell         │     │ + forces     │
│               │     │ 5. Repeat ≤300 steps   │     │ + stress     │
└──────────────┘     │ 6. Check convergence   │     └──────────────┘
                      └────────────────────────┘

10.5 Rejecter (Pruning)

The Rejecter component implements early termination of unpromising trials:

  • Energy cutoff: Reject structures with energy far above the current hull
  • Force convergence: Reject structures that fail to converge during relaxation
  • Structural validity: Reject structures with unphysical bond lengths or overlapping atoms
  • Composition filter: Reject structures outside the specified composition range

10.6 Structures Store

A file-based storage system, separate from Optuna's relational database:

structures_store/
├── trial_00001/
│   ├── initial.cif      # Candidate structure before relaxation
│   ├── relaxed.cif      # Structure after PFP relaxation
│   └── metadata.json    # Energy, forces, trial parameters
├── trial_00002/
│   ├── ...
└── index.json            # Fast lookup index

Design decision: Crystal structures are stored in a separate file-based system rather than Optuna's relational database (MySQL/SQLite) because structures are large binary objects (hundreds of floats) that would be inefficient to store as serialized strings in a relational schema.

10.7 Convex Hull Analyzer

Maintains the running convex hull across all evaluated compositions:

  • Incrementally updates the hull as new structures are evaluated
  • Computes hull distances for new structures (distance above hull = thermodynamic instability)
  • Provides the hull volume metric used by the aging mechanism
  • Generates phase diagram visualizations

11 Core Mechanisms (Detailed)

11.1 Hull-Informed Filtering

Traditional CSP methods optimize formation energy per composition independently. MTCSP introduces hull-informed filtering that considers the global convex hull:

Formation Energy (eV/atom)
      │
  0.2 │   x           x
      │     x    x
  0.0 │───●─────●─────●───── Convex Hull
      │        ●
 -0.2 │   ●
      │
 -0.4 │         ●
      └────────────────────── Composition (A₁₋ₓBₓ)
      0.0       0.5       1.0

  ● = structures ON or BELOW hull (stable/metastable)
  x = structures ABOVE hull (unstable, pruned by filter)

The filtering mechanism:

  1. After each structure evaluation, compute distance to current convex hull
  2. Structures within a threshold of the hull are retained in the population
  3. Structures far above the hull are rejected (Rejecter component)
  4. The hull is updated incrementally as new stable structures are discovered

11.2 Aging Mechanism for Elitist Selection

Standard elitist selection in GAs preserves the best individuals indefinitely, which can lead to stagnation in underexplored composition regions. MTCSP introduces an aging mechanism:

def selection_priority(structure, current_generation):
    """Priority combines fitness with recency of improvement."""
    hull_distance = compute_hull_distance(structure)
    age = current_generation - structure.last_improved_generation

    # Prioritize recently improved compositions
    recency_bonus = exp(-age / aging_constant)

    return hull_distance * recency_bonus

Effect: Compositions where the GA has recently found better structures get higher selection priority. Compositions where no improvement has occurred for many generations see their priority decay, allowing other compositions to receive exploration budget.

Selection Priority Over Time
      │
  1.0 │●
      │ ●
      │  ●
  0.5 │   ●
      │    ●
      │     ●●
      │       ●●●
  0.0 │          ●●●●●●●●
      └───────────────────── Generations since last improvement
      0    5    10   15   20

11.3 Niching for Stoichiometric Diversity

Niching prevents the population from collapsing to a few dominant stoichiometries:

WITHOUT Niching:                    WITH Niching:
Population at gen 100               Population at gen 100

Count│                              Count│
  50 │ ██                             20 │ ██    ██
  40 │ ██                             15 │ ██ ██ ██ ██
  30 │ ██                             10 │ ██ ██ ██ ██ ██
  20 │ ██ ██                           5 │ ██ ██ ██ ██ ██ ██
  10 │ ██ ██                           0 │ ██ ██ ██ ██ ██ ██
   0 │ ██ ██                             └──────────────────
     └──────────                       AB AB₂ A₂B₃ AB₃ A₃B B₂
      AB  AB₂ (collapsed)             (diverse across compositions)

The niching mechanism:

  1. Partition the composition space into niches (regions of similar stoichiometry)
  2. Limit the number of individuals from any single niche in the parent pool
  3. Ensure crossover occurs both within and across niches
  4. Track niche-level hull improvement rates for adaptive resource allocation

MTCSP's Optuna integration enables true asynchronous parallelism:

Timeline (simplified, 4 workers):

Worker 1: ──[Gen]──[Relax████████]──[Gen]──[Relax██████]──►
Worker 2: ──[Gen]──[Relax██████████████]──[Gen]──[Relax]──►
Worker 3: ──[Gen]──[Relax████]──[Gen]──[Relax████████████]─►
Worker 4: ──[Gen]──[Relax██████████]──[Gen]──[Relax██████]─►
                    ▲                   ▲
                    │                   │
              Hull update          Hull update
              (incremental)        (incremental)

[Gen]   = Structure generation (fast, ~ms)
[Relax] = PFP relaxation (variable, ~1-10s)

Key properties: - No synchronization barriers: Workers operate independently, asking Optuna for the next trial when ready - Stale hull information: Workers may use slightly outdated hull data; empirically not harmful - Database-backed coordination: Optuna's relational database (MySQL or PostgreSQL) serves as the coordination layer - Dynamic load balancing: Faster workers process more trials naturally

11.5 Genetic Operators for Crystal Structures

Operator Input Output Description
Slab crossover 2 parent structures 1 child structure Cut both parents with a random plane, combine halves
Lattice strain 1 parent structure 1 child structure Apply random strain tensor to lattice vectors
Atom permutation 1 parent structure 1 child structure Swap atomic species to explore compositions
Coordinate rattling 1 parent structure 1 child structure Add Gaussian noise to atomic positions
Symmetry-preserving 1 parent structure 1 child structure Perturb only symmetry-independent positions

Slab Crossover (Heredity Operator)

Parent A            Parent B            Child
┌──────────┐       ┌──────────┐       ┌──────────┐
│ ○ ● ○ ●  │       │ ◆ ◇ ◆ ◇  │       │ ○ ● ○ ●  │
│ ● ○ ● ○  │       │ ◇ ◆ ◇ ◆  │       │ ● ○ ● ○  │
│──cutting──│       │──cutting──│       │──────────│
│ ○ ● ○ ●  │       │ ◆ ◇ ◆ ◇  │       │ ◆ ◇ ◆ ◇  │
│ ● ○ ● ○  │       │ ◇ ◆ ◇ ◆  │       │ ◇ ◆ ◇ ◆  │
└──────────┘       └──────────┘       └──────────┘

Top half from A + Bottom half from B → Child
(Lattice vectors interpolated between parents)

11.6 Multi-Objective Formulation

MTCSP can operate in a multi-objective mode, optimizing simultaneously for:

  1. Formation energy (lower is better → thermodynamic stability)
  2. Hull distance (lower is better → closer to ground state)
  3. Structural diversity (higher is better → exploration of phase space)
  4. Composition coverage (higher is better → mapping the full phase diagram)

The NSGA-II variant handles multiple objectives through Pareto dominance and crowding distance, ensuring the search produces a diverse front of trade-off solutions.

12 Programming Language

Implementation Stack

Component Language Framework
Search algorithm (GA) Python Optuna (sampler API)
Structure manipulation Python ASE (Atomic Simulation Environment)
PFP inference C++ / CUDA (backend), Python (API) Matlantis SDK
Optuna infrastructure Python Optuna + SQLAlchemy
Structure storage Python + filesystem Custom file store
Visualization Python matplotlib, plotly
Service infrastructure Python Matlantis cloud platform

Why Python?

  1. Scientific computing ecosystem: ASE, pymatgen, NumPy, SciPy are all Python-native
  2. Optuna is Python: The optimization framework is Python-first
  3. PFP Python API: Matlantis provides Python bindings for PFP
  4. Materials science convention: The computational materials science community is heavily Python-oriented
  5. Rapid prototyping: Research algorithms benefit from Python's flexibility

Code Organization (Inferred)

mtcsp/
├── experiment.py          # Experiment class (Optuna Study wrapper)
├── sampler.py             # Custom NSGA-II variant (Optuna Sampler)
├── generators/
│   ├── random.py          # Random structure generation
│   ├── symmetry.py        # Symmetry-aware generation
│   └── crossover.py       # Slab crossover / heredity
├── evaluator/
│   ├── relaxer.py         # PFP-based structure relaxation
│   └── rejecter.py        # Pruning / early termination
├── analysis/
│   ├── hull.py            # Convex hull computation and tracking
│   ├── phase_diagram.py   # Phase diagram visualization
│   └── diversity.py       # Population diversity metrics
├── storage/
│   ├── structures.py      # File-based structure storage
│   └── optuna_storage.py  # Optuna database configuration
└── utils/
    ├── structure_ops.py   # Structure manipulation utilities
    └── composition.py     # Composition space utilities

13 Memory Management

Population Memory

The GA maintains a population of crystal structures in memory. Memory scaling:

Population Size Structures in Memory Approximate RAM
100 ~100–200 (parents + children) ~100 MB
500 ~500–1,000 ~500 MB
2,000 ~2,000–4,000 ~2 GB

Each structure stores: - Atomic positions: N_atoms × 3 × 8 bytes (float64) - Lattice vectors: 3 × 3 × 8 bytes - Atomic species: N_atoms × 4 bytes (int32) - Metadata: ~1 KB (energy, forces summary, trial info)

For a typical structure with 24 atoms: ~1 KB per structure.

Optuna Database Memory

Optuna uses a relational database (MySQL/PostgreSQL/SQLite) to store trial history:

Trials Database Size Notes
10,000 ~50 MB Comfortable for SQLite
100,000 ~500 MB MySQL/PostgreSQL recommended
1,000,000 ~5 GB Requires database tuning

Design insight: Crystal structures are stored in a separate file-based store to avoid bloating the Optuna database. Only lightweight trial metadata (parameters, objective values, state) goes into the database.

Hull Memory

The convex hull data structure grows with the number of stable structures found:

  • Typically 100–1,000 hull vertices for binary/ternary systems
  • ConvexHull computation (scipy.spatial) requires O(n log n) time and O(n) memory
  • Incremental hull updates are O(k) where k is the number of new points

GPU Memory (PFP Inference)

PFP inference requires GPU memory proportional to structure size:

Structure Size PFP GPU Memory Relaxation GPU Memory
10 atoms ~200 MB ~300 MB
50 atoms ~500 MB ~800 MB
100 atoms ~1 GB ~1.5 GB
200 atoms ~2 GB ~3 GB

Multiple workers can share a single GPU through batched inference, or each worker can use a dedicated GPU for maximum throughput.

14 Continued Learning

Session-to-Session Learning

MTCSP supports a form of continued learning through Optuna's study persistence:

  1. Warm-starting: A new search can be initialized with the population from a previous search. The Optuna study stores all trial history, which can seed the initial population of a new study.

  2. Cross-system transfer: Structures discovered for one elemental system (e.g., Li-Co-O) can seed searches in related systems (e.g., Li-Ni-O) through the structure store.

  3. Hull accumulation: The convex hull grows monotonically — new searches can only add structures, never remove them. This means each search builds on the collective knowledge of all previous searches in the same system.

What Is NOT Learned

  • No meta-learning across systems. The GA parameters (population size, mutation rates, crossover strategy) are fixed or manually tuned, not adapted based on previous search outcomes.
  • No learned mutation operators. Unlike LLM-based systems where the mutation model improves over time, MTCSP's genetic operators are fixed classical operators.
  • No transfer of search strategy. The system does not learn which search strategies work best for different classes of materials.

Potential Extensions (Not Yet Implemented)

Extension Description Difficulty
Meta-learned operator selection Bandit-based selection of crossover/mutation operators based on per-system performance Medium
Transfer learning of hull shapes Use hull topology from similar systems to guide early search Medium
Adaptive population sizing Dynamically adjust population size based on composition space complexity Low
Structure-aware embeddings Use GNN embeddings for better niching and diversity measurement High
Multi-fidelity search Start with cheap, low-accuracy evaluations and refine promising candidates with expensive DFT Medium

Optuna's Built-in Learning

Optuna itself provides some meta-learning capabilities that MTCSP inherits:

  • TPE (Tree-structured Parzen Estimator): Optuna's default sampler learns the distribution of good parameters. However, MTCSP uses a custom sampler, not TPE.
  • Pruning history: Optuna's pruners learn when to terminate trials based on intermediate values. MTCSP's Rejecter uses this mechanism.
  • Multi-study knowledge sharing: Optuna supports sharing knowledge between studies, which could enable cross-system transfer in future MTCSP versions.

15 Applications

15.1 Primary Application: Materials Discovery

MTCSP's primary use case is discovering new crystal structures for materials development:

Application Domain Example Systems Search Objective
Battery materials Li-Co-O, Li-Ni-Mn-Co-O, Na-Fe-P-O Find stable cathode/anode structures
Thermoelectrics Bi-Te-Se, Pb-Te-S Identify low thermal conductivity phases
Catalysts Pt-Ru-O, Co-Fe-O Discover active surface structures
Superconductors La-H, Y-H (high pressure) Find high-Tc candidate structures
Alloys Ti-Al-V, Ni-Co-Cr-Fe-Mn (HEA) Map phase stability in multi-component space

15.2 Phase Diagram Construction

Beyond finding individual structures, MTCSP enables automated construction of computational phase diagrams:

Phase Diagram (A-B Binary)
Temperature (K)
      │
 2000 │         Liquid
      │       ╱       ╲
 1500 │     ╱    L+α     ╲
      │   ╱       │        ╲
 1000 │  α      α + β       β
      │  │        │         │
  500 │  α        │         β
      │  │      α + β       │
    0 │  α        │         β
      └───────────────────────
      A    0.25   0.5   0.75   B
              Composition

MTCSP maps the 0K ground state hull:
Energy vs. composition → identifies stable phases (α, β, α+β mixtures)

15.3 High-Throughput Screening

MTCSP can be deployed for high-throughput computational screening:

  1. Element substitution sweeps: Systematically search structure stability across many element combinations using PFP's universality
  2. Composition optimization: Given a known structure type, find the optimal composition for target properties
  3. Polymorph discovery: Find all stable crystal structure types for a given composition

15.4 Industrial Use Cases on Matlantis

As a commercial service, MTCSP targets industrial R&D workflows:

Industry Use Case Value Proposition
Battery manufacturers New cathode material discovery Reduce experimental screening by 10-100x
Semiconductor companies Novel dielectric materials Identify candidates before synthesis
Catalyst developers Mixed oxide catalysts Explore composition-structure-activity space
Steelmakers Intermetallic phase prediction Understand precipitation in alloys
Pharmaceutical Cocrystal screening (future) Predict stable cocrystal forms

15.5 Limitations and Boundary Conditions

Limitation Impact Mitigation
0 K only No finite-temperature phase stability Post-process with phonon calculations
PFP accuracy ~30-50 meV/atom error vs DFT DFT validation of top candidates
No kinetics Cannot predict synthesizability Experimental validation required
Periodic only No amorphous or surface structures Complementary tools (LAMMPS, etc.)
Element coverage 72 elements (PFP limitation) Expanding with PFP updates
Max cell size ~200 atoms practical limit Sufficient for most inorganic crystals

15.6 Comparison with Other Materials Discovery Platforms

Platform Search Method Evaluator Open Source Composition Coverage
Materials Project Database mining DFT (VASP) Data only 150K+ computed materials
AFLOW High-throughput DFT DFT (VASP) Partially 3.5M+ entries
OQMD Database DFT (VASP) Data only 1M+ entries
GNOME (Google) Active learning GNN + DFT No 380K stable materials
MTCSP GA + Optuna PFP (uMLIP) No (commercial) Any 72-element combination

15.7 Integration with Broader Matlantis Ecosystem

MTCSP fits within Matlantis's suite of computational materials science tools:

┌──────────────────────────────────────────────┐
│              Matlantis Platform               │
│                                              │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │ MTCSP    │  │ Matlantis │  │ Matlantis  │  │
│  │ (CSP)    │  │ (NEB/MD)  │  │ (Phonons)  │  │
│  └────┬─────┘  └────┬─────┘  └─────┬─────┘  │
│       │              │              │         │
│       └──────────────┼──────────────┘         │
│                      │                        │
│              ┌───────▼───────┐                │
│              │     PFP       │                │
│              │ (Universal    │                │
│              │  Potential)   │                │
│              └───────────────┘                │
└──────────────────────────────────────────────┘

Workflow: MTCSP discovers structures → validate with NEB/MD → 
          characterize with phonon calculations

This analysis is based on the arXiv paper (2503.21201v3), the Preferred Networks technical blog post (March 2026), and publicly available information about the Matlantis platform and Optuna framework. The system represents an important intersection of evolutionary optimization and neural network surrogate models for scientific discovery, demonstrating that classical population-based search methods remain highly effective when paired with fast, accurate neural evaluators.