1) the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result; Playing Regex Golf with Genetic Programming Proc. ACM Genetic and Evolutionary Computation Conference (GECCO) 2014 2) the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s); Alberto Bartoli, bartoli.alberto@univ.trieste.it, +393291899955 Andrea De Lorenzo, mimmuzk25@gmail.com, +393489105896 Eric Medvet, emedvet@units.it, +393283314160 Fabiano Tarlao, ftarlao@gmail.com, +393298479550 The physical mailing address of each author is: "Dipartimento di Ingegneria e Architettura, Universita' di Trieste - Via Valerio 10 - 34125, Trieste - Italy" 3) the name of the corresponding author (i.e., the authorto whom notices will be sent concerning the competition); Alberto Bartoli 4) the abstract of the paper(s); Regex golf has recently emerged as a specific kind of code golf, i.e., unstructured and informal programming competitions aimed at writing the shortest code solving a particular problem. A problem in regex golf consists in writing the shortest regular expression which matches all the strings in a given list and does not match any of the strings in another given list. The regular expression is expected to follow the syntax of a specified programming language, e.g., Javascript or PHP. In this paper, we propose a regex golf player internally based on Genetic Programming. We generate a population of candidate regular expressions represented as trees and evolve such population based on a multi-objective fitness which minimizes the errors and the length of the regular expression. We assess experimentally our player on a popular regex golf challenge consisting of 16 problems and compare our results against those of a recently proposed algorithm---the only one we are aware of. Our player obtains scores which improve over the baseline and are highly competitive also with respect to human players. The time for generating a solution is usually in the order of tens minutes, which is arguably comparable to the time required by human players. 5) a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies; H 6) a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission); A challenge consisting of 16 instances of the problem of generating the shortest Javascript-compatible regular expression matching all strings in a given list and not matching any string in another given list was posted on the web ("http://regex.alfa.nu"). This challenge provoked vibrant discussions within programmers' communities and stimulated programmers to compete for improving the respective score (e.g., "https://www.google.it/search?q=regex.alfa.nu+answers"). Programmers advertised their solutions on web forums, thereby allowing other players to optimize those solutions further. Our results ranked from 6-th to 8-th amongst human players worldwide. Our results did not exploit solutions by other players and also scored significantly better than the only machine-generated solutions that we could find. The challenge was not organized formally, but its rules were stated very clearly and were widely seen as a clever way for allowing programmers to test their skill and creativity in extreme scenarios. 7) a full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable); Playing Regex Golf with Genetic Programming Bartoli, De Lorenzo, Medvet, Tarlao Proc. ACM Genetic and Evolutionary Computation Conference (GECCO) 2014. To appear. 8) a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors; Any prize money, if any, is to be divided equally among the co-authors. 9) a statement stating why the judges should consider the entry as "best" in comparison to other entries that may also be "human-competitive;" Regular expressions are a powerful tool routinely used, since a long time, in a variety of practical applications in different domains. The generation of a regular expression solving a specific task is a job that, in practice, is manually performed by programmers. This job requires very specific skills and is often performed by tailoring regular expressions previously developed by other programmers for similar tasks. Indeed, writing a regular expression effectively corresponds to writing a small computer program in a specialized language. This entry demonstrates that evolutionary computation has reached a level in which it may successfully compete with human programmers in scenarios explicitly designed to test their practical skills and creativity, and it may do so without any starting hint or external help. Furthermore, we attempt to anticipate three key objections below. i) "The problem of learning a deterministic finite automata (DFA) from a labelled set of examples and counterexamples has been considered many times in the past. Solutions to this classification problem would have solved the challenge easily" To the best of our knowledge, such proposals have always been concerned with classification problems radically different from the challenge, namely (please refer to citations [1-6]): the input stream consisted of short sequences of binary or ternary symbols (in the order of 10-15 symbols); the learning procedure was aimed at inferring a DFA whose number of states is known in advance; the learning procedure succeeded only when the number of examples was a "sufficiently large" and "sufficiently representative" portion of the possible input space. As an aside, we are not aware of any such proposal successfully applied to input streams composed of symbols from, e.g., the full ASCII alphabet. The challenge is radically different from those scenarios: in some problem instances the input stream consists of sequences of similar length but drawn from a much larger alphabet (many ASCII characters); in some other instances it consists of repetitions of one or two characters, the pattern being in the length of input unit; the number of states of the target DFA is not known in advance, and it is not even known whether such target DFA indeed exists; finally, the whole input space is available to the learning procedure in that no requirements are placed on how strings not listed in the problem specification have to be classified. Thus, while we do not claim that our solution is the best solution that one might possibly generate mechanically, we do claim that it is the best mechanically-generated solution proposed so far and that it ranks amongst the best human players worldwide. ii) "There is no guarantee that the proposal will perform equally well on other regex-generation challenges" Of course we cannot predict the performance of our proposal on any possible challenge. However, we have three arguments in this respect. First, this remark applies to human players as well. Second, this specific challenge does not consist of just a few problem instances, but is composed of 16 different instances that exercise a very broad range of regex constructs and programmers' skills: some of those instances may be expressed by a compact pattern while others involve varying levels of explicit enumeration; several instances consist of repetitions of just one character, which implies that the matching and unmatching patterns depend on the length of the input units rather than on their content. Having achieved excellent results on such a highly demanding testbed, we are confident that our proposal may indeed replicate the skills of human programmers on other challenges of this kind. Third, we constructed our proposal by modifying an earlier approach for synthesizing regular expressions for text extraction based on examples of the desired behavior. Such an approach, developed by us, greatly improved over prior state-of-the-art proposals on text extraction benchmarks of practical interest [7]. We believe that these results are a clear demonstration that evolutionary computing has become a tool for synthesizing regular expressions automatically in scenarios of practical interest. Finally, we invite judges to exercise our proposal on-line: http://regex.inginf.units.it/golf/ iii) "Regex-generation challenges of this kind have no practical application" First, none of the criteria for the Humies award explicitly requires an immediate practical application of the entry. Second, the task of binary classifying a given set of known strings may indeed occur in practical text processing applications. 10) An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc. GP (genetic programming) References ---------- [1] Lang, Kevin J., Barak A. Pearlmutter, and Rodney A. Price. “Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm.” In Grammatical Inference, 1–12. Springer, 1998. http://link.springer.com/chapter/10.1007/BFb0054059. [2] Cicchello, Orlando, and Stefan C. Kremer. “Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results.” J. Mach. Learn. Res. 4 (December 2003): 603–32. doi:10.1162/153244304773936063. [3] “Learning DFA from Noisy Samples A Contest for GECCO 2004,” n.d. http://cswww.essex.ac.uk/staff/sml/gecco/NoisyDFA.html. [4] Bongard, Josh, and Hod Lipson. “Active Coevolutionary Learning of Deterministic Finite Automata.” The Journal of Machine Learning Research 6 (2005): 1651–78. [5] Lucas, Simon M., and T. Jeff Reynolds. “Learning Deterministic Finite Automata with a Smart State Labeling Evolutionary Algorithm.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 27, no. 7 (2005): 1063–74. [6] Walkinshaw, Neil, Bernard Lambeau, Christophe Damas, Kirill Bogdanov, and Pierre Dupont. “STAMINA: A Competition to Encourage the Development and Assessment of Software Model Inference Techniques.” Empirical Software Engineering 18, no. 4 (August 1, 2013): 791–824. doi:10.1007/s10664-012-9210-3. [7] Bartoli, A., G. Davanzo, A. De Lorenzo, E. Medvet, and E. Sorio. “Automatic Synthesis of Regular Expressions from Examples.” IEEE Computer Early Access Online (2014). doi:10.1109/MC.2013.403. http://doi.ieeecomputersociety.org/10.1109/MC.2013.403 (Selected for Hot Off the Press Track at GECCO 2014) (Prototype available as a webapp http://regex.inginf.units.it/)