1. Complete title of paper

ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution


2. Author information

Robert Tjarko Lange
Sakana AI
E-mail: robert@sakana.ai

Yuki Imajuku
Sakana AI
E-mail: imajuku@sakana.ai

Edoardo Cetin
Sakana AI
E-mail: edo@sakana.ai


3. Corresponding author

Robert Tjarko Lange
E-mail: robert@sakana.ai


4. Abstract

ShinkaEvolve is an evolutionary program-discovery framework that automatically creates algorithms and programs exceeding human-designed or best-known solutions across multiple domains of indisputable difficulty, while requiring orders-of-magnitude fewer evaluations than prior systems.

Key results: (1) Applied to the 2025 ICFP Programming Contest, ShinkaEvolve accelerated Team Unagi's solver by up to 10x, contributing to their first-place finish in this international competition. (2) On the canonical 26-circle packing problem, ShinkaEvolve discovers a new state-of-the-art solution (score 2.63598325, improving on AlphaEvolve's 2.63586 and the prior human best of 2.634) using only ~150 program proposals. (3) ShinkaEvolve evolves a novel mixture-of-experts load-balancing loss that outperforms the human-designed baseline across seven downstream benchmarks in large-scale LLM training. (4) On AIME mathematical reasoning and ALE-Bench competitive programming, ShinkaEvolve outperforms hand-designed baselines and would have improved an AtCoder leaderboard position from 5th to 2nd place.

The system is open-source (Apache-2.0), peer-reviewed and accepted as an ICLR 2026 poster, and independently reproducible.


5. Criteria claimed

C, D


6. Why the result satisfies the claimed criteria

Criterion C: The result is better than the most recent human-created solution to a long-standing or previously unsolved problem of indisputable difficulty in its field.

ShinkaEvolve satisfies criterion C because it automatically created program-level discoveries that improve on human-designed or best-known solutions across several difficult, externally meaningful scientific and engineering problems.

The clearest example is the canonical 26-circle packing problem: place 26 non-overlapping circles in the unit square while maximizing the sum of radii. This is a long-standing geometric optimization problem with decades of increasingly refined solutions from the operations research and computational geometry communities. ShinkaEvolve discovered a new state-of-the-art solution using only ~150 evaluated program proposals. The reported pack_circ26 score is 2.63598325, compared with AlphaEvolve's (Google DeepMind, 2025) result of 2.63586 and a pre-AlphaEvolve best of 2.634. This represents a 0.00012 improvement over the best result from a system that required thousands of evaluations, and a 0.002 improvement over prior human-designed approaches. The discovered algorithm combines structured initialization, gradient-based refinement, and simulated annealing-style escape from local optima.

ShinkaEvolve also produced a new mixture-of-experts load-balancing loss for LLM training. Load balancing in sparse expert models is a central architectural problem in modern language-model training: a router must keep experts used effectively without destroying specialization. Starting from the human-designed global-batch load-balancing loss, ShinkaEvolve evolved an additional regularization term that targets under-specialized experts only when routing entropy indicates concentration. In large-scale follow-up training of a 2.7B parameter MoE model on nearly 30B tokens, this evolved loss improves downstream task accuracy across seven benchmarks and achieves a better perplexity/routing tradeoff than the established global-batch LBL baseline.

Additional results reinforce the breadth of human-competitive performance. On AIME mathematical reasoning, ShinkaEvolve evolved agent scaffolds in 75 generations that outperform hand-designed single-query and majority-vote baselines, and transfer across AIME years and underlying LLMs. On ALE-Bench (a benchmark of AtCoder heuristic programming contests), ShinkaEvolve improved ALE-Agent solutions by approximately 2.3% on average over 10 tasks; on AHC039, the improved solution would have moved from 5th to 2nd place on the AtCoder leaderboard if entered as a competition submission.

These results meet the "arms length" standard required by the call. Circle packing is a recognized mathematical optimization benchmark that far pre-dates AlphaEvolve and LLMs, with an independent community of researchers and connections to real-world problems. AIME and AtCoder are externally maintained competition settings with public leaderboards. ALE-Bench uses public and private contest test sets. The MoE loss is evaluated through standard language-model training and downstream benchmarks. The paper was peer-reviewed and accepted as an ICLR 2026 poster, and the complete ShinkaEvolve implementation and supplementary material are publicly available under an Apache-2.0 license (https://github.com/SakanaAI/ShinkaEvolve/). The GitHub repository has accumulated more than 1150 stars and an independent workshop at Yale University was organized to facilitate its usage in interdisciplinary fields (https://yalefds.swoogo.com/aiforscientificdiscovery).

---

Criterion D: The result wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs).

ShinkaEvolve satisfies criterion D through its direct contribution to Team Unagi's first-place finish in the 2025 ICFP Programming Contest.

The ICFP Programming Contest is an annual international programming competition organized by the ICFP conference (International Conference on Functional Programming). It is open to all, with no restrictions on team size, programming languages, or tools. Teams of expert programmers compete to solve a complex algorithmic challenge over a 72-hour period. The contest is a well-established and regulated competition with decades of history: results are externally judged, rankings are publicly posted, and prizes are awarded by the organizers.

In the 2025 edition, the task required navigating and mapping an unknown maze using ambiguous hints, demanding sophisticated SAT-based reasoning under strict query budgets. Team Unagi used ShinkaEvolve to optimize the SAT encoding at the heart of their solver. Over 320 evolutionary trials at a total computational cost of approximately $60, ShinkaEvolve evolved the Rust code of the encoding to minimize solver execution time. The results were substantial:

- Mid-scale problems (18 rooms): execution time improved from 2.86s to 0.44s (6.5x speedup).
- Large-scale problems (24 rooms): execution time dropped from 127s to 13s (10x speedup).
- Previously intractable 30-room instances became solvable within realistic time limits.

These optimizations were immediately integrated into Team Unagi's competition submissions and directly contributed to their ability to solve large problem instances that other teams could not reach. Team Unagi won first place overall.

The ICFP contest satisfies the criterion's requirement of "human contestants (in the form of either live human players or human-written computer programs)": competing teams wrote programs to solve the contest task, and most teams relied entirely on human-written code. ShinkaEvolve's evolutionary optimization of the solver code is the specific contribution of genetic/evolutionary computation to this competition victory.

Documentation: The official contest results are posted at https://icfpcontest2025.github.io/. The case study is described in detail in Sakana AI's public blog post "ShinkaEvolve in Action: How a Human-AI Partnership Conquered a Coding Challenge" (October 16, 2025) at https://sakana.ai/icfp-2025/, with full source code available at https://github.com/icfpc-unagi/icfpc2025.


7. Full citation

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. "ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution." The Fourteenth International Conference on Learning Representations (ICLR 2026), poster. OpenReview. 2026. URL: https://openreview.net/forum?id=lKEdGCoDNC


8. Prize-money division

Any prize money, if any, is to be divided equally among the co-authors.


9. Why this entry should be considered the best

ShinkaEvolve should be considered the best entry because it demonstrates the strongest combination of external validation, multi-domain breadth, and practical accessibility among plausible human-competitive entries this year.

First: external competition dominance. ShinkaEvolve directly contributed to a first-place finish in the 2025 ICFP Programming Contest, an international competition where teams of expert programmers compete with human-written programs. This is the kind of unambiguous, externally adjudicated result that prior Humies Gold winners have been built on. ShinkaEvolve provided up to 10x solver speedups that enabled previously unreachable problem instances, giving Team Unagi a decisive advantage over competing teams. Unlike many "human-competitive" claims that rely on author-selected benchmarks, this result was validated by an independent competition with independent judges and public results.

Second: breadth across four distinct problem domains. Most human-competitive entries demonstrate strength on a single problem or a closely related set of benchmarks. ShinkaEvolve produces state-of-the-art results across geometry (circle packing), neural network training (MoE load balancing), mathematical reasoning (AIME), and combinatorial optimization (AtCoder/ALE-Bench). These are not variations on one theme — they span continuous optimization, discrete search, machine learning, and program synthesis. This breadth demonstrates that ShinkaEvolve is a general-purpose evolutionary discovery engine, not a narrow benchmark optimizer.

Third: unprecedented sample efficiency changes what evolutionary discovery can achieve in practice. ShinkaEvolve's circle-packing result used ~150 proposals; AIME used 75 generations; ALE-Bench used 50 generations; the MoE loss search used only 30 iterations; and the ICFP optimization used 320 trials at ~$60 total cost. Comparable systems (AlphaEvolve, FunSearch) require thousands to tens of thousands of evaluations and access to proprietary infrastructure. ShinkaEvolve demonstrates that evolutionary computation can produce top-tier discoveries without requiring top-tier compute budgets. This makes the approach reproducible and accessible to the broader community — a property that the Humies judges have historically valued in prize-winning entries.

Fourth: the work is fully open and independently reproducible. The code is Apache-2.0 licensed, available on PyPI, and includes documentation, examples, a WebUI, and agent-facing workflows. The paper is peer-reviewed and accepted at ICLR 2026. Any researcher can install ShinkaEvolve, run the same experiments, and verify the claims. This transparency contrasts with several recent AI discovery systems that rely on closed proprietary infrastructure.

In summary: ShinkaEvolve combines a competition win, state-of-the-art discoveries across four domains, orders-of-magnitude efficiency gains, and full open-source reproducibility. No single other entry is likely to match this combination of external validation, breadth, efficiency, and openness.


10. General type of genetic or evolutionary computation used

ShinkaEvolve maintains an archive and population of evaluated programs, selects parent and inspiration programs via evolutionary technique balancing exploration and exploitation, uses LLMs as genetic mutation and recombination operators to propose program edits, evaluates candidate programs with task-specific fitness functions, rejects insufficiently novel candidates via code novelty rejection-sampling, and uses a bandit-based LLM ensemble selection strategy to adapt the mutation operator during evolution. All of these steps use or take great inspiration from techniques based on the genetic improvement, genetic programming, and evolutionary strategies literature.


11. Date of publication

OpenReview public posting / ICLR 2026 accept decision:
January 26, 2026.