1) TITLE Automatic Transcription of Polyphonic Piano Music using Genetic Algorithms, Adaptive Spectral Envelope Modeling and Dynamic Noise Level Estimation -------------------------------------------------------------------------------- 2) AUTHORS Gustavo Reis Department of Computer Science School of Technology and Management Polytechnic Institute of Leiria Address: Apt. 3063 – Morro do Lena – Alto do Vieiro 2401-951 Leiria – Portugal e-mail: gustavo.reis@ipleiria.pt phone: +351 933 451 711 Francisco Fernandez Department of Computer Science and Comunications University Center of Merida Univesity of Extremadura Address: C/Sta. Teresa de Jornet, 38. 06800 Mérida - Badajoz. Spain e-mail: fcofdez@unex.es Phone: +346 924 38 70 68 Anibal Ferreira Department of Electrical and Computer Engineering School of Engineering of the University of Porto Address: Rua Dr. Roberto Frias, 4200-465 Porto, PORTUGAL e-mail: ajf@fe.up.pt phone: +351-22-508-1471 -------------------------------------------------------------------------------- 3) CORRESPONDING AUTHOR Gustavo Reis -------------------------------------------------------------------------------- 4) ABSTRACT This paper presents a new method for multiple fundamental frequency (F0) estimation on piano recordings. We propose a framework based on a genetic algorithm in order to analyze the overlapping overtones and search for the most likely F0 combination. The search process is aided by adaptive spectral envelope modeling and dynamic noise level estimation: while the noise is dynamically estimated, the spectral envelope of previously recorded piano samples (internal database) is adapted in order to best match the piano played on the input signals and aid the search process for the most likely combination of F0s. For comparison, several state-of-the-art algorithms were run across various musical pieces played by different pianos and then compared using three different metrics. The proposed algorithm ranked first place on Hybrid Decay/Sustain Score metric, which has better correlation with the human hearing perception and ranked second place on both Onset-only and Onset-Offset metrics. A previous genetic algorithm approach is also included in the comparison to show how the proposed system brings significant improvements on both quality of the results and computing time. -------------------------------------------------------------------------------- 5) CRITERIA SATISFIED BY THE WORK (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. (H) The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs). -------------------------------------------------------------------------------- 6) RATIONALE This work satisfies the criteria listed above for the following reasons. (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. and (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions. Several state-of-the-art algorithms published in peer-reviewed scientific journals were compared with our approach and our algorithm ranked first with the metric that best resembles the human earing perception, also it ranked as the 2nd best algorithm on the other two metrics: Onset-Offset and Onset only. Furthermore, our proposal was also ranked as 2nd best algorithm on MIREX 2011 contest, on Piano subtask (Multiple Fundamental Frequency Estimation and Tracking). (H) The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs). Our algorithm was submitted to MIREX 2011, Piano subtask (Multiple Fundamental Estimation and Tracking) and ranked as 2nd best. The other competing algorithms were created by humans. -------------------------------------------------------------------------------- 7) FULL CITATION OF THE PAPER Reis, G.; Fernandez de Vega, F.; Ferreira, A., "Automatic Transcription of Polyphonic Piano Music Using Genetic Algorithms, Adaptive Spectral Envelope Modeling, and Dynamic Noise Level Estimation," Audio, Speech, and Language Processing, IEEE Transactions on , vol.20, no.8, pp.2313,2328, Oct. 2012 doi: 10.1109/TASL.2012.2201475 keywords: {Adaptation models;Estimation;Gain;Genetic algorithms;Harmonic analysis;Noise;Noise level;Acoustic signal analysis;automatic music transcription;fundamental frequency (F0) estimation;music information retrieval;pitch perception}, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6205337&isnumber=6263279 -------------------------------------------------------------------------------- 8) STATEMENT ABOUT THE PRIZE DISTRIBUTION Any prize money, if any, is to be divided equally among the co-authors. -------------------------------------------------------------------------------- 9) WHY THE JUDGES SHOULD CONSIDER THIS ENTRY a) Despite being an evolutionary approach, our algorithm also mimics how musicians learn to play a tune by hear: the algorithm first listens the audio and then tries to play it the best as it can, improving the generated tune from iteration to iteration. This is the main reason why our approach is the best, according to the human hearing perception: the algorithm tries to reproduce the sound that has eared. This way, our algorithm, besides behaving like humans, it also outperforms their algorithms, according to the human hearing perception. b)Our algorithm is also the first bio inspired algorithm achieving the state-of-the-art. c) Although our research addresses a very difficult problem in the automatic transcription of music, the problem is still open. Nevertheless, our results show that an EA is able to outperform any known human-devised methodology. d) The work has been positively evaluated by experts in the Music Information Retrieval community. For this community, the methodology exploited is irrelevant: what matters is the efficiency of the final results.