Dear Humies Competition Committee,

We respectfully wish to register the following entry into the 15th Annual (2018) “Humies” Competition.

The 11 items requested for entry are detailed below:

1) Complete title of the work submitted: “Emergent Solutions to High-Dimensional Multi-Task Reinforcement Learning”

2) Names, contact addresses, email, phone numbers for EACH author:
Stephen Kelly
Dalhousie University, Faculty of Computer Science, 6050 University Av., Halifax, NS, B3H 4R2 Canada
skelly@cs.dal.ca
902 233 0758

Malcolm I. Heywood
Dalhousie University, Faculty of Computer Science, 6050 University Av., Halifax, NS, B3H 4R2 Canada
mheywood@cs.dal.ca
902 712 2005

3) Corresponding Author: 
Malcolm I. Heywood

4) Paper Abstract: 
Algorithms that learn through environmental interaction and delayed rewards, or reinforcement learning, increasingly face the challenge of scaling to dynamic, high-dimensional, and partially observable environments. Significant attention is being paid to frameworks from deep learning, which scale to high-dimensional data by decomposing the task through multi-layered neural networks. While effective, the representation is complex and computationally demanding. In this work we propose a framework based on Genetic Programming which adaptively complexifies policies through interaction with the task. We make a direct comparison with several deep reinforcement learning frameworks in the challenging Atari video game environment as well as more traditional reinforcement learning frameworks based on a priori engineered features. Results indicate that the proposed approach matches the quality of deep learning while being a minimum of three orders of magnitude simpler with respect to model complexity. This results in real-time operation of the champion RL agent without recourse to specialized hardware support. Moreover, the approach is capable of evolving solutions to multiple game titles simultaneously with no additional computational cost. In this case, agent behaviours for an individual game as well as single agents capable of playing \textit{all} games emerge from the same evolutionary run. 

5) Criteria that the work satisfies:
(B) The result is equal or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
(F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered.
(G) The result solves a problem of indisputable difficulty in its field.

6) Statement why the result satisfies the criteria:
Visual reinforcement learning was first demonstrated in the general case by a group from Google DeepMind in 2015 with a result published in the journal Nature using a combination of Deep learning and reinforcement learning (the DQN framework). To do so, the authors compared the DQN performance over a limited number of interactions with the Atari console frame buffer and a human player over a total of 49 game titles. DQN was able to produce better than human results for half of the games. This was the basis for the claim of human competitive results under the same gaming experience. That is to say, no hand designed features were involved, and no use was made of forward models, such as Monte Carlo Tree Search (which is a fundamental requirement of DeepMind’s results for the game of Go, Chess etc). In our work, we demonstrate that we at the very least match the quality of the original DQN results, *and* a set of improved DQN approaches under the same suite of 49 Atari game titles. In addition, we demonstrate that the *same* algorithm is also able to evolve a policy for playing 5 game titles simultaneously, just by interacting with the frame buffer, and still beat the DQN results when DQN is trained independently on *each* game title. Finally, all results using Deep learning entail a significant computational overhead, both to produce the solution *and* to deploy the solution post training. Conversely, all our solutions are evolved single core, and are multiple orders of magnitude simpler. This means that our solutions execute on a laptop computer faster than any form of solution employing Deep learning.

7) Full citation of the paper:
Stephen Kelly, Malcolm I. Heywood (2018) Emergent Solutions to High-Dimensional Multi-Task Reinforcement Learning. Evolutionary Computation. MIT Press. In press. 26(3): Fall 2018 issue.

8) Prize money: 100% the monetary aspect of the prize will go to Stephen Kelly.

9) Statement for why this entry would be the ‘best’: 
The 2015 result from Google DeepMind demonstrated: 1) human competitive results, 2) operation without any hand designed features, 3) generality of the approach across multiple game titles. Since this DQN result all other developments have either been optimizations of the DQN approach, or predefined Deep learning architectures with weights parameterized by an evolutionary method.

Ours result for the first time demonstrates: 1) solutions are discovered using an emergent process, i.e. complexity of a solution matches the complexity of the task, there is no predefinition of the solution topology. 2) solution quality is statistically equivalent to solutions from Deep learning, which are themselves competitive with human results. 3) solutions are far more efficient than any solution using a Deep learning architecture because no convolution operator is employed. 4) our approach is able to discover solutions for playing multiple game titles from a single policy that are still better than those discovered by DQN.

10) General type of Evolutionary Computation used:
Genetic Programming

11) Publication date:
Above work is to appear in Evolutionary Computation Journal (MIT Press) in the fall 2018 issue. Work was submitted July 2017 and accepted in June 2018. See additional PDF for confirmation of paper status.