1. Our entry is based on the following two published papers:

W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically
finding patches using genetic programming.''  31st International
Conference on Software Engineering (ICSE) (May, 2009).  12.3%
acceptance rate.  One of 5 ICSE papers selected as ACM SIGSOFT
Distinguished Papers, and the one recipient of the IFIP TC2 Manfred
Paul Award for Excellence in Software: Theory and Practice
(http://www.cs.up.ac.za/cs/jbishop/TC2Award.html).

S. Forrest, W. Weimer, T. Nguyen, and C. Le Goes ``A Genetic
Programming Approach to Automated Software Repair.'' Genetic and
Evolutionary Computation Conference (in press).  Nominated for Best
Paper Award.

2.  The names of the authors of both papers, in alphabetical order:

Prof. Stephanie Forrest
Dept. of Computer Science
University of New Mexico
Mail stop: MSC01 1130
1 University of New Mexico
Albuquerque, NM 87131-0001
505-277-7104
forrest@cs.unm.edu

Ms. Claire Le Goues
Department of Computer Science
School of Engineering
University of Virginia
151 Engineer's Way, P.O. Box 400740
Charlottesville, VA 22904
617-308-0637
Legoues@virginia.edu 

Mr. ThanhVu Nguyen
Dept. of Computer Science
University of New Mexico
Mail stop: MSC01 1130
1 University of New Mexico
Albuquerque, NM 87131-0001
505-277-3112
tnguyen@cs.unm.edu

Prof. Westley Weimer
Department of Computer Science
School of Engineering
University of Virginia
151 Engineer's Way, P.O. Box 400740
Charlottesville, VA 22904
434-924-1021
weimer@virginia.edu

3. Corresponding author: Stephanie Forrest

4. Paper abstracts

ICSE Abstract: Automatic repair of programs has been a longstanding
goal in software engineering, yet debugging remains a largely manual
process. We introduce a fully automated method for locating and
repairing bugs in software. The approach works on off-the-shelf
legacy applications and does not require formal speciﬁcations, program
annotations or special coding practices. Once a program fault is
discovered, an extended form of genetic programming is used to
evolve program variants until one is found that both retains required
functionality and also avoids the defect in question. Standard test
cases are used to exercise the fault and to encode program
requirements. After a successful repair has been discovered, it is
minimized using structural differencing algorithms and delta
debugging. We describe the proposed method and report results from an
initial set of experiments demonstrating that it can successfully
repair ten different C programs totaling 63,000 lines in under 200
seconds, on average.

GECCO Abstract: Genetic programming is combined with program analysis
methods to repair bugs in off-the-shelf legacy C programs. Fitness is
deﬁned using negative test cases that exercise the bug to be
repaired and positive test cases that encode program
requirements. Once a successful repair is discovered, structural
differencing algorithms and delta debugging methods are used to
minimize its size. Several modiﬁcations to the GP technique contribute
to its success: (1) genetic operations are localized to the nodes
along the execution path of the negative test case; (2) high-level
statements are represented as single nodes in the program tree; (3)
genetic operators use existing code in other parts of the program,
so new code does not need to be invented. The paper describes the
method, reviews earlier experiments that repaired 11 bugs in over
60,000 lines of code, reports results on new bug repairs, and
describes experiments that analyze the performance and efﬁcacy of the
evolutionary components of the algorithm.

5. Criterion: G

6. Rationale:

Debugging is an indisputably difficulty activity, as anyone who has
taught computer programming to humans knows. In fact, debugging and
maintenance can account for up to 90% of the total cost of a typical
software project, and the number of outstanding software defects
exceeds the resources available to fix them.  At this time, our method
has successfully repaired 16 bugs, 15 of which are well documented
problems in production programs and one of which we programmed
ourselves as the initial test.  These include bugs in several Unix
utilities, webserver programs, and several important classes of
security vulnerabilities (some of these we solved since the original
two papers were submitted for publication).

However, people have also successfully fixed the programs that we
report on.  Thus, our claim for human-competitiveness also rests on
the time it takes to discover a repair.  We have not directly measured
how long it would take a human to find a repair for the programs in
our test suite---starting from scratch with source code, an
interpreter and a compiler, and nothing else besides some positive and
negative test cases.  It is notoriously difficult to estimate
programmer productivity in general.  However, published numbers
indicate that it takes 28 days on average for maintainers to develop
and distribute fixes for security vulnerabilities (some of which are
included in our published papers), and anywhere from 13 to 89 days to
develop patches for OS bugs [1]. New code on large projects is created
at the average rate of 10 lines of source code per per-day [2], and
the average patch produced by our technique produces 4 lines of source
code.  We note that the task for humans is more difficult if the human
is unfamiliar with the program before beginning to debug, as the GP
is.

These data suggest that on average it could take a human 3.2 hours to
analyze a defect and synthesize a novel program modification of equal
size to the repairs produced by our technique.  Even if humans are an
order-of-magnitude faster at this task than traditional metrics
suggest, and would thus take only 18 minutes, the repair times
reported in the ICSE paper (average of 3 minutes and maximum under 10
minutes) are highly competitive with what a human programmer could
hope to achieve.  For example, the Zune bug repair described in the
GECCO paper was achieved in 37 seconds.  We achieved these results
using a quad-core 3 GHz machine, which has a limited amount of
parallelism. Our prototype is single-threaded, executing only one
ﬁtness function evaluation at a time, although during a ﬁtness
evaluation all test cases are executed in parallel.  Thus, we expect
that we could improve the GP's time efficiency by running on faster
hardware and exploiting additional parallelism.

A third possible metric is repair quality.  We have not studied this 
systematically yet, but we address repair quality in the ICSE paper,
and in more recent unpublished results. 

[1] Symantec Internet Security Threat Report, September 2006
http://www.symantec.com/specprog/threatreport/ent-whitepaper_symantec_internet_security_threat_report_x_09_2006.en-us.pdf

[2] Boehm, B. and Basili, V. R. 2000. Gaining Intellectual Control of
Software Development. IEEE Computer 33, 5 (May. 2000), 27-33.
http://portal.acm.org/citation.cfm?id=621471

7. Full citations:

S. Forrest, W. Weimer, T. Nguyen, and C. Le Goes ``A Genetic
Programming Approach to Automated Software Repair.'' Genetic and
Evolutionary Computation Conference, G. Raidl et al. Ed., Association
for Computing Machinery, NY, NY (in press).

W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically
finding patches using genetic programming.''  31st International
Conference on Software Engineering (ICSE '09), IEEE Computer Society
and Association of Computing Machinery, Vancouver CA (2009).

8. Any prize money, if any, is to be divided equally among the
co-authors.

9. Statement of why the judges should consider the entry as “best” in
comparison to other entries that may also be “human-competitive.”

The dream of fully automatic programming has eluded computer
scientists for at least 50 years. Although the methods described in
the two papers do not evolve new programs from scratch, they do show
how to evolve large legacy programs to repair existing faults, an
important first step towards the ideal of automatic programming.  Even
this first step comes as a surprise and a challenge to most reviewers,
who have not contemplated the idea of letting a machine evolve generic
software on its own.  To the best of our knowledge, there is no other
extant GP technique that can automatically produce direct, textual
program repairs for large, off-the-shelf legacy programs.  As such,
the entry complements and extends existing work on Search-Based
Software Engineering in the evolutinary computation community.

Further, despite its name, Genetic Programming (GP) has not yet
replaced human programmers, who still develop, maintain, and repair
computer programs largely by hand.  Showing how to use GP in the
context of modern software systems and integrating GP into modern
software practice, will help evolutionary computation become more
widely accepted by computer scientists.  Thus, this entry goes to the
heart of what genetic programming is designed to achieve and advances
the cause of evolutionary computation in the software engineering
commnity.




__________ Information from ESET Smart Security, version of virus signature database 4044 (20090430) __________

The message was checked by ESET Smart Security.

http://www.eset.com