1. Title of the paper Evaluating Medical Aesthetics Treatments through Evolved Age-Estimation Models 2. Authors Risto Miikkulainen (risto@cognizant.com)^1^2 Elliot Meyerson (elliot.meyerson@cognizant.com)^1 Xin Qiu (xin.qiu@cognizant.com)^1 Ujjayant Sinha (ujjayant.sinha@cognizant.com)^3 Raghav Kumar (raghav.kumar@cognizant.com)^3 Karen Hofmann (karen.hofmann@cognizant.com)^4 Yiyang Matt Yan (matt.yan@abbvie.com)^5 Michael Ye (michael.ye@abbvie.com)^5 Jingyuan Yang (jingyuan.yang@abbvie.com)^5 Damon Caiazza (damon.caiazza@abbvie.com)^5 Stephanie Manson Brown (stephanie.mansonbrown@abbvie.com)^6 1 Cognizant AI Labs, San Francisco, CA, USA 2 The University of Texas at Austin, Austin, TX, USA 3 Cognizant Technology Solutions, New Delhi, India 4 Cognizant Technology Solutions, Bethlehem, PA, USA 5 AbbVie Inc., Irvine, CA, USA 6 AbbVie Inc., Marlow, UK 3. Corresponding author Risto Miikkulainen, risto@cs.utexas.edu 4. Abstract Estimating a person's age from a facial image is a challenging problem with clinical applications. Several medical aesthetics treatments have been developed that alter the skin texture and other facial features, with the goal of potentially improving patient's appearance and perceived age. In this paper, this effect was evaluated using evolutionary neural networks with uncertainty estimation. First, a realistic dataset was obtained from clinical studies that makes it possible to estimate age more reliably than e.g. datasets of celebrity images. Second, a neuroevolution approach was developed that customizes the architecture, learning, and data augmentation hyperparameters and the loss function to this task. Using state-of-the-art computer vision architectures as a starting point, evolution improved their original accuracy significantly, eventually outperforming the best human optimizations in this task. Third, the reliability of the age predictions was estimated using RIO, a Gaussian-Process-based uncertainty model. Evaluation on a real-world Botox treatment dataset shows that the treatment has a quantifiable result: The patients' estimated age is reduced significantly compared to placebo treatments. The study thus shows how AI can be harnessed in a new role: To provide an objective quantitative measure of a subjective perception, in this case the proposed effectiveness of medical aesthetics treatments 5. Criteria satisfied B, D, E, F, G 6. Justification of criteria satisfied Designs for deep learning networks were evolved to estimate a person's age from their facial image, exceeding the accuracy of the best networks designed by humans. Age estimation is a challenging benchmark task for deep learning: given a picture of a subject's face, the goal is to estimate his/her age as accurately as possible. The task is difficult for humans as well: under controlled conditions, they can do it with a mean absolute error of 3-4 years, and under more diverse conditions, 6-8 years [52,54]. Several deep learning network architectures have been evaluated in this task, with accuracy comparable to humans. The goal of this project was to improve upon them using evolutionary metalearning, i.e. by evolving the designs of the deep learning networks using genetic algorithms. Several aspects of the design were evolved simultaneously, including network architecture (choice of base model and its output block structure), combination of loss functions (MAE, cross-entropy), data augmentation (e.g. rotation, width, height, shear, zoom, cutout), and learning parameters (e.g. optimizer, learning rate, momentum, weight decay; see Table 1 in the paper for details). Each design was evaluated through partial training with patient images that were taken prior to medical aesthetics treatment. Performance was evaluated with two different datasets. With a smaller and more homogeneous dataset D0, an existing DenseNet architecture was used as a starting point, and mean absolute error was improved from 4.35 to 3.30 years---thus exceeding the best human design for the same architecture and dataset of 3.79 years. With the larger and more diverse set D1, an existing EfficientNet architecture was used as the starting point, and its accuracy was improved from 3.65 to 2.19, again exceeding the accuracy of best human design of 2.33. These results are presented graphically in Figure 1 in the paper. Evolution made several useful discoveries: It converged on flips that swapped the left and right sight of the face, regardless of the image orientation. It adjusted the width and height to focus on the most informative parts of the face and to avoid overfitting. It preferred MAE loss early on and cross-entropy later, presumably to avoid overfitting. It preferred architectures that made decisions based on later layers rather than significant skip connections. All of these are meaningful post-hoc, although difficult to anticipate ahead of time. Also, evolution was more effective in combining their effects than people were, suggesting that there are interactions between network design dimensions that are still not well understood---yet evolution can take advantage of them. Such complexity is precisely where evolution provides an advantage over human design. In terms of the criteria: (E,G) Since 2017, age estimation has been one of the well-known deep learning benchmarks tasks, with indisputable and useful difficulty: As new deep learning architectures emerge (such as VGG, DenseNet, MobileNet, EfficientNet), their power is often demonstrated in this task [47,56,57]. (D) Improved age estimation is publishable in its own right: First, it is a benchmark task, i.e. such results demonstrate progress in deep learning. Second, age estimation is useful in real-world applications such as medical aesthetics. As a case in point, our paper shows that it can be used to evaluate quantitatively whether various such treatments are effective. (B,F) We replicated the previously published state-of-the-art result for DenseNet on the IMBD-WIKI dataset [56], obtaining mean absolute error of 7.43 years. Next, we demonstrated an initial MAE of 3.56 for DenseNet on our dataset (a more comprehensive and realistic than IMDB-WIKI). We then improved this MAE to 2.16 using evolutionary optimization, thus demonstrating improvement over the published state of the art. To our knowledge, this is the best accuracy so far using any approach on any age estimation dataset; it is also significantly better than human performance on this task. 7. Citation Miikkulainen, R., Meyerson, E., Qiu, X., Sinha, U., Kumar, R., Hofmann, K., Yan, Y. M., Ye, M., Yang, J., Caiazza, D., Manson Brown, S. (2021). Evaluating Medical Aesthetics Treatments through Evolved Age-Estimation Models. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2021). 8. Prize money, if any, will be divided equally among the co-authors. 9. Why this entry is the best There are now many results in the literature showing that evolutionary optimization of deep learning neural networks can improve performance over hand-designed architectures. What sets this result apart is that it was obtained in a head-to-head comparison with human experts. That is, at the same time as the Cognizant team worked on optimizing the architectures through evolution, a team of data scientists at Abbvie worked on optimizing them by hand---using the same datasets, same initial architectures, and approximately the same amount of time. Periodically, insights obtained in this process were shared across teams, thus making sure they had access to the same information. In the end, evolution came up with significantly better designs. The experiment thus demonstrates not only that evolution is competitive with human experts, but it actually adds value. The entry thus shows that evolution can have a large impact in solving complex engineering challenges beyond the ability of humans to do so. 10. Type of EC used Genetic Algorithms 11. Date of Publication July 10, 2021 References (with the same numbering as in the paper) [47] Rothe, R., Timofte, R., and Van Gool, L. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 126, 2 (2018), 144–157. [52] Sörqvist, P. and Eriksson, M. 2007. Effects of training on age estimation. Applied Cognitive Psychology 21, 1 (2007), 131–135. [54] Voelkle, M. C., Ebner, N. C., Lindenberger, U., and Riediger, M. 2012. Let me guess how old you are: Effects of age, gender, and facial expression on perceptions of age. Psychology and aging 27:265-277. [56] Yang, T.-Y., Huang, Y.-H., Lin, Y.-Y., Hsiu, P.-C., and Chuang, Y.-Y. 2018. SSR-Net: A Compact Soft Stagewise Regression Network for Age Estimation.. In International Joint Conference on Artificial Intelligence. [57] Qawaqneh, Zakariya et al. 2017. Deep Convolutional Neural Network for Age Estimation based on VGG-Face Model. arXiv:1709.01664.