1. Our entry is based on the following two published papers: W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically finding patches using genetic programming.'' 31st International Conference on Software Engineering (ICSE) (May, 2009). 12.3% acceptance rate. One of 5 ICSE papers selected as ACM SIGSOFT Distinguished Papers, and the one recipient of the IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice (http://www.cs.up.ac.za/cs/jbishop/TC2Award.html). S. Forrest, W. Weimer, T. Nguyen, and C. Le Goes ``A Genetic Programming Approach to Automated Software Repair.'' Genetic and Evolutionary Computation Conference (in press). Nominated for Best Paper Award. 2. The names of the authors of both papers, in alphabetical order: Prof. Stephanie Forrest Dept. of Computer Science University of New Mexico Mail stop: MSC01 1130 1 University of New Mexico Albuquerque, NM 87131-0001 505-277-7104 forrest@cs.unm.edu Ms. Claire Le Goues Department of Computer Science School of Engineering University of Virginia 151 Engineer's Way, P.O. Box 400740 Charlottesville, VA 22904 617-308-0637 Legoues@virginia.edu Mr. ThanhVu Nguyen Dept. of Computer Science University of New Mexico Mail stop: MSC01 1130 1 University of New Mexico Albuquerque, NM 87131-0001 505-277-3112 tnguyen@cs.unm.edu Prof. Westley Weimer Department of Computer Science School of Engineering University of Virginia 151 Engineer's Way, P.O. Box 400740 Charlottesville, VA 22904 434-924-1021 weimer@virginia.edu 3. Corresponding author: Stephanie Forrest 4. Paper abstracts ICSE Abstract: Automatic repair of programs has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for locating and repairing bugs in software. The approach works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices. Once a program fault is discovered, an extended form of genetic programming is used to evolve program variants until one is found that both retains required functionality and also avoids the defect in question. Standard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using structural differencing algorithms and delta debugging. We describe the proposed method and report results from an initial set of experiments demonstrating that it can successfully repair ten different C programs totaling 63,000 lines in under 200 seconds, on average. GECCO Abstract: Genetic programming is combined with program analysis methods to repair bugs in off-the-shelf legacy C programs. Fitness is defined using negative test cases that exercise the bug to be repaired and positive test cases that encode program requirements. Once a successful repair is discovered, structural differencing algorithms and delta debugging methods are used to minimize its size. Several modifications to the GP technique contribute to its success: (1) genetic operations are localized to the nodes along the execution path of the negative test case; (2) high-level statements are represented as single nodes in the program tree; (3) genetic operators use existing code in other parts of the program, so new code does not need to be invented. The paper describes the method, reviews earlier experiments that repaired 11 bugs in over 60,000 lines of code, reports results on new bug repairs, and describes experiments that analyze the performance and efficacy of the evolutionary components of the algorithm. 5. Criterion: G 6. Rationale: Debugging is an indisputably difficulty activity, as anyone who has taught computer programming to humans knows. In fact, debugging and maintenance can account for up to 90% of the total cost of a typical software project, and the number of outstanding software defects exceeds the resources available to fix them. At this time, our method has successfully repaired 16 bugs, 15 of which are well documented problems in production programs and one of which we programmed ourselves as the initial test. These include bugs in several Unix utilities, webserver programs, and several important classes of security vulnerabilities (some of these we solved since the original two papers were submitted for publication). However, people have also successfully fixed the programs that we report on. Thus, our claim for human-competitiveness also rests on the time it takes to discover a repair. We have not directly measured how long it would take a human to find a repair for the programs in our test suite---starting from scratch with source code, an interpreter and a compiler, and nothing else besides some positive and negative test cases. It is notoriously difficult to estimate programmer productivity in general. However, published numbers indicate that it takes 28 days on average for maintainers to develop and distribute fixes for security vulnerabilities (some of which are included in our published papers), and anywhere from 13 to 89 days to develop patches for OS bugs [1]. New code on large projects is created at the average rate of 10 lines of source code per per-day [2], and the average patch produced by our technique produces 4 lines of source code. We note that the task for humans is more difficult if the human is unfamiliar with the program before beginning to debug, as the GP is. These data suggest that on average it could take a human 3.2 hours to analyze a defect and synthesize a novel program modification of equal size to the repairs produced by our technique. Even if humans are an order-of-magnitude faster at this task than traditional metrics suggest, and would thus take only 18 minutes, the repair times reported in the ICSE paper (average of 3 minutes and maximum under 10 minutes) are highly competitive with what a human programmer could hope to achieve. For example, the Zune bug repair described in the GECCO paper was achieved in 37 seconds. We achieved these results using a quad-core 3 GHz machine, which has a limited amount of parallelism. Our prototype is single-threaded, executing only one fitness function evaluation at a time, although during a fitness evaluation all test cases are executed in parallel. Thus, we expect that we could improve the GP's time efficiency by running on faster hardware and exploiting additional parallelism. A third possible metric is repair quality. We have not studied this systematically yet, but we address repair quality in the ICSE paper, and in more recent unpublished results. [1] Symantec Internet Security Threat Report, September 2006 http://www.symantec.com/specprog/threatreport/ent-whitepaper_symantec_internet_security_threat_report_x_09_2006.en-us.pdf [2] Boehm, B. and Basili, V. R. 2000. Gaining Intellectual Control of Software Development. IEEE Computer 33, 5 (May. 2000), 27-33. http://portal.acm.org/citation.cfm?id=621471 7. Full citations: S. Forrest, W. Weimer, T. Nguyen, and C. Le Goes ``A Genetic Programming Approach to Automated Software Repair.'' Genetic and Evolutionary Computation Conference, G. Raidl et al. Ed., Association for Computing Machinery, NY, NY (in press). W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically finding patches using genetic programming.'' 31st International Conference on Software Engineering (ICSE '09), IEEE Computer Society and Association of Computing Machinery, Vancouver CA (2009). 8. Any prize money, if any, is to be divided equally among the co-authors. 9. Statement of why the judges should consider the entry as “best” in comparison to other entries that may also be “human-competitive.” The dream of fully automatic programming has eluded computer scientists for at least 50 years. Although the methods described in the two papers do not evolve new programs from scratch, they do show how to evolve large legacy programs to repair existing faults, an important first step towards the ideal of automatic programming. Even this first step comes as a surprise and a challenge to most reviewers, who have not contemplated the idea of letting a machine evolve generic software on its own. To the best of our knowledge, there is no other extant GP technique that can automatically produce direct, textual program repairs for large, off-the-shelf legacy programs. As such, the entry complements and extends existing work on Search-Based Software Engineering in the evolutinary computation community. Further, despite its name, Genetic Programming (GP) has not yet replaced human programmers, who still develop, maintain, and repair computer programs largely by hand. Showing how to use GP in the context of modern software systems and integrating GP into modern software practice, will help evolutionary computation become more widely accepted by computer scientists. Thus, this entry goes to the heart of what genetic programming is designed to achieve and advances the cause of evolutionary computation in the software engineering commnity. __________ Information from ESET Smart Security, version of virus signature database 4044 (20090430) __________ The message was checked by ESET Smart Security. http://www.eset.com