The pdf files are attached to the e-mail. (1) a) A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms b) Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms c) Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data (2) - Full authors' names: a) Rodrigo Coelho Barros b) Mrcio Porto Basgalupp c) Andr Carlos Ponce de Leon Ferreira de Carvalho d) Alex Alves Freitas e) Ana Trindade Winck f) Karina Machado g) Duncan Dubugras Alcoba Ruiz h) Osmar Norberto de Souza - Physical address for authors a) and c): Instituto de Cincias Matemticas e de Computao Universidade de So Paulo Avenida Trabalhador So-Carlense, 400 - Centro, 13566-590 So Carlos, SP, Brazil Phone: +55 16 3373-9700 Emails: rcbarros@icmc.usp.br andre@icmc.usp.br - Physical address for author b): Instituto de Cincia e Tecnologia - UNIFESP Rua Talim, 330, 12231-280 So Jos dos Campos, SP, Brazil Phone: +55 12 3309 9582 e-mail: basgalupp@unifesp.br - Physical address for author d): School of Computing University of Kent Canterbury, CT2 7NF United Kingdom e-mail: A.A.Freitas@kent.ac.uk - Physical address for author e): Universidade Federal de Santa Maria (UFSM) Centro de Tecnologia (CT), Departamento de Computao Aplicada (DCOM) Prdio 07, sala 1204A - Anexo C Av. Roraima, 1000. Bairro Camobi, 97105-900 Santa Maria, RS, Brazil Phone: +55 55 3320 8418 e-mail: ana@inf.ufsm.br - Physical address for author f): Universidade Federal do Rio Grande, Centro de Cincias Computacionais - C3 Av. Itlia, Km 08, Prdio 02, Sala 2105, 96201-900 Rio Grande, RS, Brazil Phone: +55 53 3233 6807 e-mail: karina.machado@furg.br - Physical address for author g) and h): Faculdade de Informtica Pontifcia Universidade Catlica do Rio Grande do Sul - PUCRS Av. Ipiranga, 6681- Prdio 32, Sala 608, 90619-900 Porto Alegre, RS, Brasil Phone: +55 51 3320 3611 Ext. 8608 e-mail: duncan.ruiz@pucrs.br osmar.norberto@pucrs.br (3) Rodrigo Coelho Barros (4) - Abstract for (a): Decision tree induction is one of the most employed methods to extract knowledge from data, since the representation of knowledge is very intuitive and easily understandable by humans. The most successful strategy for inducing decision trees, the greedy top-down approach, has been continuously improved by researchers over the years. This work, following recent breakthroughs in the automatic design of machine learning algorithms, proposes a hyper-heuristic evolutionary algorithm for automatically generating decision-tree induction algorithms, named HEAD-DT. We perform extensive experiments in 20 public data sets to assess the performance of HEAD-DT, and we compare it to traditional decision-tree algorithms such as C4.5 and CART. Results show that HEAD-DT can generate algorithms that significantly outperform C4.5 and CART regarding predictive accuracy and F-Measure. - Abstract for (b): This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability of providing an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated in more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI data sets and 10 microarray gene expression data sets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART. - Abstract for c): BACKGROUND: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. RESULTS: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. CONCLUSIONS: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. (5) Criteria: (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. (G) The result solves a problem of indisputable difficulty in its field. (6) Why the work satisfies the criteria: In machine learning and data mining, top-down decision-tree induction algorithms are widely-used for generating comprehensible and accurate classification models. Even though there has been more than 40 years of research in this area, the best decision-tree induction algorithms to date are still algorithms that date from the 80's (CART) and early 90's (C4.5). Many new classification algorithms have been proposed in the recent years, which considerably outperform the accuracy of CART and C4.5 overall, though it should be noticed that these algorithms (which are not decision-tree algorithms) are usually black-box approaches, which do not explain to users the reasons that led to a given prediction. The authors of this entry have proposed a hyper-heuristic evolutionary algorithm that is capable of automatically generating different (and tailor-made) top-down decision-tree induction algorihtms. These machine-made solutions were shown to outperform the state-of-the-art decision-tree algorithms CART and C4.5 regarding predictive performance (measured by two well-known criteria: accuracy and F-Measure). The approach is called HEAD-DT, and it was sucessfully tested in well-known benchmarking UCI datasets, as well as in specific gene-expression datasets and flexible-receptor docking data. The latter two are important and challenging bioinformatics problems. (E) The proposed hyper-heuristic (namely HEAD-DT) was shown to perform significantly better, overall, than the state-of-the-art top-down decision-tree algorithms CART and C4.5, which are still extensively used in both academia and industry. Whereas several studies have proposed enhancements for particular building block of either CART or C4.5, HEAD-DT is capable of evolving an entire decision tree algorithm (with all its required building blocks), therefore creating a novel algorithm that is tailor made to a given (set of) dataset(s), without any human instruction about how to combine different building blocks into a coherent decision tree algorithm. (F) HEAD-DT provides algorithms that in general outperform both CART and C4.5. The latter algorithms were considered a big achievment in the fields of machine learning and data mining by the time they were developed, and they are still widely-employed for solving a great variety of classification problems. (G) HEAD-DT solves a problem of indisputable difficulty in its field, since to the best of our knowledge, there is no known algorithm to date that is capable of automatically designing a complete top-down decision-tree induction algorithm, which can be in turn tailored to a particular application domain or data distribution. HEAD-DT is the pioneering hyper-heuristic in the field of constructing complete decision-tree algorithms, and it received the best-paper award from GECCO 2012, in the tracks of "IGEC+S*S+SBSE" (paper a), given its good results and originality. It was then empirically extended and published in the Evolutionary Computation journal (paper b). A particular application to flexible-receptor docking data (a challenging bioinformatics problem) was published in the BMC Bioinformatics journal. (7) Bibliographic Details: a) BARROS, Rodrigo C. ; BASGALUPP, M. P. ; CARVALHO, A.C.P.L.F. de ; FREITAS, A. A.. A Hyper-Heuristic Evolutionary Algorithm for Automatically Designing Decision-Tree Algorithms. In: Genetic and Evolutionary Computation Conference, 2012, Philadelphia, USA. Genetic and Evolutionary Computation Conference (GECCO 2012), 2012. p. 1237-1244. b) BARROS, Rodrigo C. ; BASGALUPP, M. P. ; CARVALHO, A.C.P.L.F. de ; FREITAS, A. A.. Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms. Evolutionary Computation. Available online, 2013. c) BARROS, Rodrigo C. ; WINCK, A. T. ; MACHADO, K. S. ; BASGALUPP, M. P. ; CARVALHO, A.C.P.L.F. de ; RUIZ, D. ; SOUZA, O. N.. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data. BMC Bioinformatics, v. 13, p. 310, 2012. (8) Prize division: - 50% Rodrigo Barros - 20% Mrcio Basgalupp - 15% Andr Carvalho - 15% Alex Freitas (9) Comparison to other human-competitive entries: - The approach is the first one (and so far the only one, to the best of our knowledge) to automatically design top-down decision-tree induction algorithms. This is particularly important since a recent poll from the kdnuggets website (http://www.kdnuggets.com/polls/2011/algorithms-analytics-data-mining.html) stated that decision-tree algorithms were the most used data analysis tool in 2011! - The automatic design of machine learning algorithms is a quite recent research field. Our approach, together with the work of Pappa and Freitas (Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach) are pioneers in this particular field, and the results so far have shown to be quite promising! By automatically designing these predictive algorithms, the possibility of tailor-made machine-designed algorithms that are built incredibly fast (in a matter of hours!) may help solving a large number of difficult classification problems. By contrast, developing a human-designed decision tree algorithm which is tailor-made for a given application domain or a particular dataset might take several months (or years!), since such a design would require a lot of expertise not only in decision-tree algorithms, but also in the target application domain. Actually, we are not aware of any human-designed decision tree algorithm which is tailor made for a given application domain in the literature, presumably due to the sheer difficult of performing that algorithm-design task manually. - The approach has been awarded the best-paper award in the tracks of "IGEC+S*S+SBSE" in GECCO 2012, which shows it was recognized by its peers as a consistent and exciting solution to machine learning and data mining problems. We also believe the approach may help consolidating the research in the automatic design of other types of machine learning algorithms, opening a new and exciting research field on its own.