1. The complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result. Policy Optimization by Genetic Distillation 2. The name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s). Tanmay Gangwani 1010 W University Ave, Urbana, IL Ð 61801 Email: gangwan2@illinois.edu Phone: +1-2178197228 Jian Peng 2118 Siebel Center, 201 N Goodwin Ave, Urbana, IL Ð 61801 Email: jianpeng@illinois.edu 3. The name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition). Tanmay Gangwani (gangwan2@illinois.edu) 4. The abstract of the paper(s). Genetic algorithms have been widely used in many practical optimization problems. Inspired by natural selection, operators, including mutation, crossover and selection, provide effective heuristics for search and black-box optimization. However, they have not been shown useful for deep reinforcement learning, possibly due to the catastrophic consequence of parameter crossovers of neural networks. Here, we present Genetic Policy Optimization (GPO), a new genetic algorithm for sample-efficient deep policy optimization. GPO uses imitation learning for policy crossover in the state space and applies policy gradient methods for mutation. Our experiments on MuJoCo tasks show that GPO as a genetic algorithm is able to provide superior performance over the state-of-the-art policy gradient methods and achieves comparable or higher sample efficiency. 5. A list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies. (D), (F) 6. A statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission). (D) The result is publishable in its own right as a new scientific result, independent of the fact that the result was mechanically created. * We show how operators from the genetic algorithms literature Ð crossover, mutation, selection Ð can be applied to a population of reinforcement learning agents, to achieve high performance without using extensive number of samples, unlike previous evolution-based proposals. We believe that, the fact that our algorithm is able to achieve better sample-efficiency than the state-of-the-art policy gradient methods is promising; more importantly so because the constituent ideas are generalizable and can be applied to any ensemble-based deep neural network training. For instance, in computer vision, using crossover, mutation and selection operators similar to our paper, we could train an ensemble of convolutional neural networks for image-classification on multiple sources to achieve domain adaptation. Put succinctly, the simplicity and wide-applicability of the tricks used in our approach make it useful for the larger ML community, in our humble opinion. (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered. * Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) are two popular and extensively used policy-gradient based reinforcement learning algorithms, offering stable and high performance on a range of tasks (environments). We compare our algorithm to A2C and PPO in the paper, and report better performance across a range of simulated environments. 7. A full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable). Tanmay Gangwani, Jian Peng. Policy Optimization by Genetic Distillation. Published as a conference paper at International Conference on Learning Representations (ICLR) 2018. 8. A statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors. Prize money, if any, will be divided equally among the co-authors. 9. A statement stating why the authors expect that their entry would be the "best". Evolving deep neural networks (DNNs), both connectivity and weights, for non-convex optimization has been an active research topic for long. But not many approaches, if any, have been able to achieve reliable performance for large DNNs, without having to perform extensive amount of computations or evaluations of the fitness of a genome. In reinforcement learning, the ÒevaluationÓ is particularly taxing since it involves expensive interactions of the agent with the environment. We have proposed an algorithm which takes the salient ideas from genetic algorithms, such as crossover to create stronger offspring, mutation to improve each genome and selection to maintain quality diversity, and places them in the context of reinforcement learning policies parameterized by large DNNs. To the best of our knowledge, our approach is more sample-efficient (in terms of the required environment interactions) than other works which use standard policy-gradient algorithms, or those based on evolution strategies. Moreover, its general applicability (mentioned in detail in point 6.) creates interesting future research opportunities. 10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc. Genetic Algorithms 11. The date of publication of each paper.Ê If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is Òin pressÓ by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement. Date of publication - April 30th, 2018