1. Title of paper "Classification of EEG Signals using Genetic Programming for Feature Construction" 2. Authors Ícaro Marcelino Miranda Department of Computer Science University of Brasília Brasília, Brasil icaro.marcelino@hotmail.com +555 61 98115 9289 Claus Aranha Graduate School of Systems and Information Engineering University of Tsukuba Tsukuba, Japan caranha@cs.tsukuba.ac.jp +81 29 853 6574 Marcelo Ladeira Department of Computer Science University of Brasília Brasília, Brasil mladeira@unb.br 3. Corresponding Author Ícaro Marcelino Miranda (If necessary, presentation at GECCO 2019 will be done by Claus Aranha) 4. Abstract The analysis of electroencephalogram (EEG) waves is of critical importance for the diagnosis of sleep disorders, such as sleep apnea and insomnia, besides that, seizures, epilepsy, head injuries, dizziness, headaches and brain tumors. In this context, one important task is the identification of visible structures in the EEG signal, such as sleep spindles and K-complexes. The identification of these structures is usually performed by visual inspection from human experts, a process that can be error prone and susceptible to biases. Therefore there is interest in developing technologies for the automated analysis of EEG. In this paper, we propose a new Genetic Programming (GP) framework for feature construction and dimensionality reduction from EEG signals. We use these features to automatically identify spindles and K-complexes on data from the DREAMS project. Using 5 different classifiers, the set of attributes produced by GP obtained better AUC scores than those obtained from PCA or the full set of attributes. Also, the results obtained from the proposed framework obtained a better balance of Specificity and Recall than other models recently proposed in the literature. Analysis of the features most used by GP also suggested improvements for data acquisition protocols in future EEG examinations. 5. Criteria List (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. (D) The result is publishable in its own right as a new scientific result — independent of the fact that the result was mechanically created. (G) The result solves a problem of indisputable difficulty in its field. 6. Statement on Criteria Criterion B: Our model outperforms current models from the literature (Lachner-Piza et al. (Journal of neuroscience methods 2018), Tsanas and Clifford (Frontiers in Human Neuroscience 9, 2015), Zhuang et al. (Applied Informatics 2016)). It provides better trade-off between precision and recall without loss of specificity, and generates classifiers with greater recall. For this particular domain, false negatives are a bigger concern than false positives, so classifiers with great recall is a desirable behavior. Criterion D: In addition to the more general claim of using Genetic Programming for Feature Construction, this paper shows two properties of the EEG data that were found by GP and are of interest in their own right. - D-1: The GP has identified that only one out of three channels used is necessary for the task of model identification of sleep EEG signals. This implies that a simplified data collection procedure is possible, which would make the process less uncomfortable for human patients. - D-2: The accuracy of model identification in the analysis of sleep EEG signals can be improved by the use of short signal samples (2 second samples). The size of the signal samples is of great importance for the design of experiments in this field. Criterion G: The identification of sleep spindles and K-complexes in EEG signals is challenging for both human specialists and automatic classifiers. For human specialists, is a tiresome, error-prone and possible biased process. For automatic classifiers, is a complex problem, because of the strong presence of noise and the high dimensionality of the signal features. Even so, sleep spindles in patients with sleep disorders may have distorted shapes being more difficult to identify. The K complexes have the difficulty of being confused with any high peaks that occur in the EEG. 7. Citation Ícaro Marcelino Miranda, Claus Aranha, Marcelo Ladeira, "Classification of EEG Signals using Genetic Programming for Feature Construction", In Proceedings of the Annual Genetic and Evolutionary Computation Conference, ACM, Prage, 2019. 8. Prize Money Any prize money is to be divided evenly among the co-authors. 9. Authors' Statement to the Judges In this paper, a GP system is used to perform signal analysis in an important task that is still done by human experts. The GP system not only finds solutions competitive with the human experts, it also finds new important features of this problem which will improve future results for both automated and human analysts. Because of this, we believe that the result is not only human competitive in the domain of EEG signal analysis, but will also be able to generalize and find similar results in other similar domains. - Importance of the problem: Identification of structures in Sleep EEG signals is of extreme importance for sleep staging, for the understanding the brain behavior during sleep and for the identification of multiple sleep-related disorders, such as sleep apnea, insomnia, etc. Undiagnosed sleep disturbances generate economic losses. In the United States, in a study by Kappur et. al (Sleep, 1999), the losses with undiagnosed obstructive sleep apnea were approximately 3 billion dollars per year. - Significance of the Results: The feature set found by GP was able to improve the performance of all classifiers tested, and achieved results equal or better than the reference human practitioners. Additionally, the approach does not require the use of filters nor normalization, which facilitates the interpretation of the results. Finally, the GP found that a reduced set of signals was sufficient for identification, which implies an improved data collection method that saves resources and is less uncomfortable for human patients by using less electrodes. - Generalization to other domains: The lack of filters or normalization increases the generality of the models, and the new attributes formed by explicit relation between variables increases the interpretability of the models. These two characteristics facilitate the communication of the results to other domains in the medical field. 10. General Type GP: Feature selection and construction with Genetic Programming 11. The date of publication In press (Accepted for GECCO 2019)