----- Entry for the 2009 "Humies" Awards for Human-Competitive Results ----- ---------------------------------------------------------------------------- (1) the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result: "Evolvable Malware" ---------------------------------------------------------------------------- (2) the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper: Name: Sadia Noreen Physical Address: FAST National University of Computer & Emerging Sciences A.K. Brohi Road, H-11/4, Islamabad, Pakistan Email: sadia.noreen@nu.edu.pk Name: Shafaq Murtaza Physical Address: FAST National University of Computer & Emerging Sciences A.K. Brohi Road, H-11/4, Islamabad, Pakistan Email: shafaq.murtaza@nu.edu.pk Name: M. Zubair Shafiq Physical address: Next Generation Intelligent Neworks Research Center FAST National University of Computer & Emerging Sciences A.K. Brohi Road, Sector H-11/4, Islamabad, Pakistan Email: zubair.shafiq@nexginrc.org Tel: +92 51 111 128 128 (Ext. 190) Name: Muddassar Farooq Physical address: Next Generation Intelligent Networks Research Center FAST National University of Computer & Emerging Sciences A.K. Brohi Road, Sector H-11/4, Islamabad, Pakistan Email: muddassar.farooq@nexginrc.org Tel: +92 51 111 128 128 (Ext. 206) ---------------------------------------------------------------------------- (3) the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition): Sadia Noreen ---------------------------------------------------------------------------- (4) the abstract of the paper(s): The concept of artificial evolution has been applied to numerous real world applications in different domains. In this paper, we use this concept in the domain of virology to evolve computer viruses. We call this domain as "Evolvable Malware". To this end, we propose an evolutionary framework that consists of three modules: (1) a code analyzer that generates a high-level genotype representation of a virus from its machine code, (2) a genetic algorithm that uses the standard selection, cross-over and mutation operators to evolve viruses, and (3) the code generator converts the genotype of a newly evolved virus to its machine-level code. In this paper, we validate the notion of evolution in viruses on a well-known virus family, called Bagle. The results of our proof-of-concept study show that we have successfully evolved new viruses-previously unknown and known-variants of Bagle-starting from a random population of individuals. To the best of our knowledge, this is the first empirical work on evolution of computer viruses. In future, we want to improve this proof-of-concept framework into a full-blown virus evolution engine. ---------------------------------------------------------------------------- (5) a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies: D, G, E, F ---------------------------------------------------------------------------- (6) a statement stating why the result satisfies the criteria that the contestant claims: Several techniques are used by virus writers to create new malware. Commonly used techniques include encryption which hides the true structure and functionality of the original virus. Such 'polymorphic' techniques just change the structure of an existing malware -- evading their signatures in the database of an antivirus product -- by using different encryption/decryption routines. Only recently `metamorphic' techniques have been introduced which mutate code of a given malware to produce new malware. The mutation could be just adding garbage instructions to produce a new signature to evade existing signature-based commercial anti-virus products. Some other techniques simply replace an existing instruction by a combination of other instructions. The above-mentioned techniques do not bring evolution "in the true sense" because the functionality of the new malware is effectively the same as of the non-mutated malware. In our research, we have applied genetic algorithm to evolve the high-level representation of a given malware. The high-level (abstract) representation models the structure and functionality of a given malware and -- using our malware evolution engine -- is automatically extracted from the machine-level code of a given malware. Once we have evolved individuals of the new population by using a genetic algorithm, we translate these high-level genotypes back to the machine-level code by utilizing a code generation module. The fitness of the evolved population is validated, without human intervention, by using commercial anti-virus products. We started with a random population of machine-level samples -- the population contains only a few valid malware samples -- the final population at the end of our evolution process contains at least 41% guaranteed legitimate malware samples which are detected by commercial anti-virus products. Moreover, about half of the rest i.e. 59% are legitimate malware samples which are not detected by commercial anti-virus products. Interestingly, a reasonable portion of the newly evolved malware samples are the ones which are never present in the initial population. The newly evolved malware samples have significantly different functionality compared with malware samples of original population. To the best of our knowledge, this is the first work which has truly generated "artificial life like evolvable viruses". It is a well-known fact among malware writers that automatically generating functionally different viruses from the existing malware is a significantly challenging task. The following five criteria for human-competitive performance are satisfied by our results: D: The result is publishable in its own right as a new scientific result independent of the fact that the result was mechanically created. E: The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. F: The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. G: The result solves a problem of indisputable difficulty in its field. ---------------------------------------------------------------------------- (7) a full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable); Sadia Noreen, Shafaq Murtaza, M. Zubair Shafiq, and Muddassar Faroor, ``Evolvable Malware'', In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), ACM Press, Montreal, Canada, July, 2009. ---------------------------------------------------------------------------- (8) a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors: Any prize money is to be divided equally among the authors. ---------------------------------------------------------------------------- (9) a statement stating why the judges should consider the entry as "best" in comparison to other entries that may also be "human-competitive." We believe that successfully realizing the idea of "Evolvable Malware" is a significant achievement that would revolutionize the way antivirus products are designed, implemented and tested. We envision that our "virus evolution engine" would become a cardinal component of regression testing and validation framework of antivirus products. It will help in generating a large repository of new malware samples that would help in protecting against them in a timely fashion -- in contrast to the existing practice -- before the bad guys develop them. In future, this idea can be extended to develop a co-evolution engine in which antivirus products evolve with virus evolution engine. The framework could be easily extended to include worms and botnets -- network based malware. As a result, our framework will add great value to computer security research and security products.