**1. the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result;
Grammar-obeying program synthesis: A novel approach using large language models and many-objective genetic programming

**2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);
Ning Tao, 10 Whitfield Grove, Rathmines, Dublin 6, Dublin, Ireland, ning.tao@ucdconnect.ie, 00353873301628.
Anthony Ventresque, School of Computer Science and Statistics, O’Reilly Institute, Trinity College Dublin, Dublin 2, Ireland, anthony.ventresque@tcd.ie, 0035318962634.
Vivek Nallur, School of Computer Science, University College Dublin, Belfield, Dublin 4, vivek.nallur@ucd.ie, 003531716 2475.
Takfarinas Saber, School of Computer Science, University of Galway, University Road, Galway, Co. Galway, Ireland, takfarinas.saber@universityofgalway.ie, 0035391492583.

**3. the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition);
Takfarinas Saber

**4. the abstract of the paper(s);
Program synthesis is an important challenge that has attracted significant research interest, especially in recent years with advancements in Large Language Models (LLMs). Although LLMs have demonstrated success in program synthesis, there remains a lack of trust in the generated code due to documented risks (e.g., code with known and risky vulnerabilities). Therefore, it is important to restrict the search space and avoid bad programs. In this work, pre-defined restricted Backus–Naur Form (BNF) grammars are utilised, which are considered ‘safe’, and the focus is on identifying the most effective technique for grammar-obeying program synthesis, where the generated code must be correct and conform to the predefined grammar. It is shown that while LLMs perform well in generating correct programs, they often fail to produce code that adheres to the grammar. To address this, a novel Similarity-Based Many-Objective Grammar Guided Genetic Programming (SBMaOG3P) approach is proposed, leveraging the programs generated by LLMs in two ways: (i) as seeds following a grammar mapping process and (ii) as targets for similarity measure objectives. Experiments on a well-known and widely used program synthesis dataset indicate that the proposed approach successfully improves the rate of grammar-obeying program synthesis compared to various LLMs and the state-of-the-art Grammar-Guided Genetic Programming. Additionally, the proposed approach significantly improved the solution in terms of the best fitness value of each run for 75% of the problems.

**5. a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies;
(B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
(C) The result is equal to or better than a result that was placed into a database or archive of results maintained by an internationally recognised panel of scientific experts.
(D) The result is publishable in its own right as a new scientific result  independent of the fact that the result was mechanically created.
(E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions.
(G) The result solves a problem of indisputable difficulty in its field.

**6. a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission);
Re (B):
The results achieved by our proposed Similarity-Based Many-Objective Grammar Guided Genetic Programming (SBMaOG3P) approach meet and exceed the human-competitive criteria by clearly advancing the state of the art in grammar-obeying program synthesis, a key subfield of program synthesis and genetic programming:
-Superiority over Published Peer-Reviewed Results: Our method significantly outperforms Grammar-Guided Genetic Programming (G3P [1]), a well-established approach previously published in peer-reviewed scientific literature. G3P, while effective at ensuring grammar adherence, is limited by its reliance solely on Input/Output error-rate minimisation and suffers from poor scalability. In contrast, SBMaOG3P integrates outputs from Large Language Models (LLMs) in a dual capacity: as seed inputs (via grammar mapping) and as similarity-based objectives. This multifaceted use of LLMs allows our method to solve 75% of the benchmark problems, nearly doubling the performance of G3P, which only solves 42%--a clear and quantifiable improvement over a published scientific benchmark.
-Surpassing State-of-the-Art LLMs: While LLMs such as ChatGPT, Gemma, LLaMa, Mistral, and Zephyr have shown potential in generating correct code, they consistently fail to ensure conformance to predefined grammars. Our SBMaOG3P approach directly addresses this shortcoming by leveraging LLM outputs within a genetic programming framework that enforces grammar constraints. As a result, our method solves four times more grammar-obeying problems than the best-performing LLM, which was only able to solve less than 18% of the problems in the benchmark. This showcases our method's superiority in enforcing syntactic correctness—a crucial requirement for safety and trust in synthesised programs.
-Importance of Novel Contributions Verified through Ablation Studies [6]: Through systematic ablation studies, we demonstrate that each component of SBMaOG3P (LLM-based seeding, similarity-driven objectives, and grammar mapping) contributes critically to the final performance. All ablated versions underperform the full approach, confirming the necessity and novelty of the complete SBMaOG3P framework. This comprehensive validation strengthens the argument that our method is not only a marginal improvement but a scientifically novel and robust advancement over prior methods.

Re (C): 
Our experimental evaluation was conducted using the well-known and widely used General Program Synthesis Benchmark Suite developed by Helmuth and Spector [3], which is a recognised standard in the field of program synthesis. This benchmark comprises diverse problems carefully selected from introductory-level programming courses, ensuring both educational relevance and generality. Each problem in the suite includes natural language descriptions, as well as clearly defined training and testing datasets, allowing for consistent and reproducible evaluation. 

To ensure a fair and meaningful comparison with prior state-of-the-art methods, we adopted the same domain-specific grammars as defined in the Grammar-Guided Genetic Programming (G3P) framework [1], preserving consistency in the representation and constraints of the search space. By employing this benchmark and grammar setup, we ensured that our experimental results are directly comparable to those reported in peer-reviewed literature, thus reinforcing the validity and significance of our contributions.

Re (D):
Our proposed method, Similarity-Based Many-Objective Grammar Guided Genetic Programming (SBMaOG3P), directly addresses two of the most pressing challenges in program synthesis: the lack of scalability in Grammar-Guided Genetic Programming (G3P) and the lack of trustworthiness in Large Language Models (LLMs). We overcome these limitations by innovatively integrating LLM outputs into a many-objective G3P (MaOG3P) framework through two complementary mechanisms:
-LLM-Seeding with Grammar Mapping: We introduce a grammar-mapping phase that transforms LLM-generated code into valid grammatical structures, which are then used as seeds for the evolutionary process. This enables the exploration of high-potential areas of the search space while allowing for incremental improvement and error correction through evolution.
-Similarity-Based Many-Objective Guidance: Beyond traditional Input/Output error-rate optimisation, we introduce similarity-based objectives that guide the evolutionary process toward solutions similar to LLM-generated ones. This dual-objective strategy ensures that evolution remains aligned with both correctness and syntactic structure, leveraging LLM insights without sacrificing reliability.

As a result, our method solves 4x more grammar-obeying problems than the best-performing LLM and boosts G3P’s success rate from 42% to 75% on a standard benchmark. This substantial improvement is supported by a comprehensive ablation study, which confirms the indispensable contribution of each component in our framework.
Our work has been peer-reviewed and published in the Q1-ranked journal Computer Standards & Interfaces (Elsevier, H-Index: 80), underscoring both the scientific rigour and significance of our contribution to the field of genetic and evolutionary computation.

Re (E):
In the domains of general program synthesis, and by extension grammar-obeying program synthesis, no human-crafted technique has proven as viable, scalable, or successful as modern Genetic Programming (GP) and Large Language Model (LLM)-based approaches. 
While deductive synthesis, symbolic reasoning, or syntax-guided approaches are effective in narrowly defined or domain-specific contexts, they often struggle to generalise across diverse programming tasks or scale efficiently with increasing problem complexity [4].
Our work builds on these strengths by integrating the generalisation capabilities of LLMs with the robust optimisation power of GP, demonstrating a new state of the art in grammar-obeying program synthesis. This hybrid approach not only harnesses the best of both worlds but also overcomes their individual limitations, setting a new benchmark for future research in the field.

Re (G):
Program synthesis is universally recognized as a problem of exceptional difficulty, requiring the automatic generation of correct and efficient programs from high-level, often ambiguous specifications—-a task that is both computationally intensive and theoretically complex[4]. This difficulty is further exacerbated when the synthesis process must obey a specific grammar, which imposes structural constraints on the space of candidate programs. While constrained grammars help narrow the search space by excluding syntactically invalid candidates, they simultaneously impose rigid structural boundaries that drastically increase the complexity of the synthesis process. A successful program synthesiser must not only satisfy the functional intent of the specification but also ensure that the generated programs are semantically meaningful and syntactically valid within the prescribed grammar [5]. This dual demand (correctness and conformance) elevates grammar-obeying program synthesis to a uniquely hard computational problem, underscoring the need for robust, intelligent, and adaptive methods.

[1] Forstenlechner, S., Fagan, D., Nicolau, M., & O’Neill, M. (2018, August). Extending program synthesis grammars for grammar-guided genetic programming. In International Conference on Parallel Problem Solving from Nature (pp. 197-208). Cham: Springer International Publishing.
[2] Tao, N., Ventresque, A., Nallur, V., & Saber, T. (2024). Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic Programming. Algorithms, 17(7), 287.
[3] Helmuth, T., & Spector, L. (2015, July). General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (pp. 1039-1046).
[4] Gulwani, S., Polozov, O., & Singh, R. (2017). Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2), 1-119.
[5] Alur, R., Bodik, R., Juniwal, G., Martin, M. M., Raghothaman, M., Seshia, S. A., ... & Udupa, A. (2013). Syntax-guided synthesis (pp. 1-8). IEEE.
[6] Tao, N., Multi-Objective Grammar-Guided Genetic Programming for Grammar-Obeying Program Synthesis, PhD Thesis, University College Dublin, 2025.

**7. a full citation of the paper (that is, author names; title, publication date; name of journal, conference, or book in which article appeared; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable);
Ning Tao, Anthony Ventresque, Vivek Nallur, Takfarinas Saber, Grammar-obeying program synthesis: A novel approach using large language models and many-objective genetic programming, Computer Standards & Interfaces, Volume 92, Elsevier B.V., Netherlands, 2025, 103938, ISSN 0920-5489, https://doi.org/10.1016/j.csi.2024.103938. (https://www.sciencedirect.com/science/article/pii/S0920548924001077)

**8. a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors;
All prize money, if any, is to be divided equally between Ning Tao and Takfarinas Saber.

**9. a statement stating why the authors expect that their entry would be the "best," and
We confidently assert that our paper presents the most advanced and comprehensive solution to an extremely important and challenging problem (i.e., grammar-obeying program synthesis)--a cornerstone problem in program synthesis and evolutionary computation. Our claim is grounded in three compelling pillars:
-Unmatched Performance on a Gold-Standard Benchmark: Our SBMaOG3P approach solves 75% of benchmark problems while strictly adhering to grammatical constraints, outperforming: five state-of-the-art LLMs (which solve ≤5 problems under grammar constraints), the latest G3P method (the previous gold standard in grammar-guided synthesis), various ablations of our method (which shows the importance of each of its components).
-Deepest Integration of LLMs and Evolutionary Computation: We introduce the most sophisticated integration of LLM capabilities with evolutionary computation for grammar-obeying program synthesis to date, combining grammar mapping, similarity objectives, and many-objective optimisation--creating a system that not only leverages LLMs effectively but also overcomes their limitations in grammar compliance and trustworthiness.
-Most Thorough Evaluation and Commitment to Reproducibility: We provide the most extensive comparison against both LLM and GP baselines. Perform an analysis across the complete General Program Synthesis Benchmark Suite and conduct a rigorous examination of failure cases and limitations, and open-source release of all tools and data for full reproducibility.
Together, these contributions position our work as the most complete, impactful, and forward-looking solution available, and we strongly believe it sets a new standard for human-competitive results in genetic and evolutionary computation.


**10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GI (genetic improvement), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.
General Type: Genetic Programming.
Specifically Grammar-Guided Genetic Programming that is expanded to included seeding and various code similarity objectives.

**11. The date of publication of each paper.  If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is “in press” by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement.
14 November 2024.