1. the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result;
The series of the papers, including:
[1] Hvatov, A., & Maslyaev, M. (2020, July). The data-driven physical-based equations discovery using evolutionary approach. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (pp. 129-130).
[2] Maslyaev, M., Hvatov, A., & Kalyuzhnaya, A. V. (2021). Partial differential equations discovery with EPDE framework: application for real and synthetic data. Journal of Computational Science, 101345.
[3] Hvatov, A., & Maslyaev, M. Multi-Objective Discovery of PDE Systems Using Evolutionary Approach (CEC-2021 conference, unconditionally accepted, in press)
2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);
Alexander Hvatov
email: alex_hvatov@itmo.ru
phone: +7 952 220 32 76
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
Mikhail Maslyaev
email: mikemaslyaev@itmo.ru
phone: +7 915 145 97 25
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
Anna Kalyuzhnaya
email: anna.kalyuzhnaya@itmo.ru
phone: +7 911 038 27 68
ITMO University
49 Kronverksky Pr.
St. Petersburg
197101
Russian Federation
3. the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition);
AH
4. the abstract of the paper(s);
[1] The modern machine learning methods allow one to obtain the data-driven models in various ways. However, the more complex the model is, the harder it is to interpret. In the paper, we describe the algorithm for the mathematical equations discovery from the given observations data. The algorithm combines genetic programming with the sparse regression.
This algorithm allows obtaining different forms of the resulting models. As an example, it could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
The main idea is to collect a bag of the building blocks (it may be simple functions or their derivatives of arbitrary order) and consequently take them from the bag to create combinations, which will represent terms of the final equation. The selected terms pass to the evolutionary algorithm, which is used to evolve the selection. The evolutionary steps are combined with the sparse regression to pick only the significant terms. As a result, we obtain a short and interpretable expression that describes the physical process that lies beyond the data.
In the paper, two examples of the algorithm application are described: the PDE discovery for the metocean processes and the function discovery for the acoustics.
[2] Data-driven methods provide model creation tools for systems where the application of conventional analytical methods is restrained. The proposed method involves the data-driven derivation of a partial differential equation (PDE) for process dynamics, helping process simulation and study. The paper describes the methods that are used within the EPDE (Evolutionary Partial Differential Equations) partial differential equation discovery framework. The framework involves a combination of evolutionary algorithms and sparse regression. Such an approach is versatile compared to other commonly used data-driven partial differential derivation methods by making fewer assumptions about the resulting equation. This paper highlights the algorithm features that allow data processing with noise, which is similar to the algorithm's real-world applications. This paper is an extended version of the ICCS-2020 conference paper.
[3] Usually, the data-driven methods of the systems of partial differential equations (PDEs) discovery are limited to the scenarios, when the result can be manifested as the single vector equation form. However, this approach restricts the application to the real cases, where, for example, the form of the external forcing is of interest for the researcher and can not be described by the component of the vector equation. In the paper, a multi-objective co-evolution algorithm is proposed. The single equations within the system and the system itself are evolved simultaneously to obtain the system. This approach allows discovering the systems with the form-independent equations. In contrast to the single vector equation, a component-wise system is more suitable for expert interpretation and, therefore, for applications. The example of the two-dimensional Navier-Stokes equation is considered.
5. a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies;
(B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal.
(F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered.
(G) The result solves a problem of indisputable difficulty in its field.
6. a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission);
The classical idea to obtain global law from particular observation is a foundation stone of science. Humankind spent centuries sitting under the apple tree to discover the gravitational law.
The modern expert without the computer aid has a slight possibility to discover new equations. The primary tool is the variational principles. However, most of the equations and variational principles are described, and thus the discovery possibilities are very restricted. Therefore, the synergy of the expert and computer may give the new equations and different viewpoints in the future.
(B) Discovery of partial differential equations (PDE) is a well-known problem. However, as stated in the classical and novel works, usually it is done as the regression on a pre-defined set of the differential terms. The first attempt to obtain equations from observational data was to combine the terms from the known equations to increase the quality of the data reproduction. Latter, terms were generalized, and the discovery process transforms to the pre-defined atomary differential terms regression. It means that we preliminarily create the set of all possible products of differential operators of a given order. That requires the initial guess and expertise to define which term should be in the library and which is not.
We note that there are other approaches, such as neural network-aid. However, they are less controllable and interpretable.
We propose an evolutionary approach to the differential equation discovery from the small library of atomary differential operators to build the equation in the most general form, including equations with an operator (an in perspective coefficients) non-linearity. We shift from the extensive pre-defined terms library to the small building block library, passed to the algorithm that creates the products of terms representing data.
Apart from the classical genetic programming operators such as cross-over and mutation, we introduce the regularization operator to reduce excessive model growth presented in the classical genetic programming workflow. The canceling of the terms that are not describing a sufficient amount of variability is crucial for the PDE discovery systems. Mainly, the regularization is done to preserve the generality of the model. It is the essential property of such models to describe the general laws in the first place.
As a recent advancement, we used a multi-objective approach to obtain additional control over a discovery process. For example, the introduction of model complexity objective allows getting a series of models that enable obtaining models of different scales. Moreover, it introduces additional interpretability since the expert may make conclusions on a Pareto frontier basis instead of a single model.
The second application of the multi-objective approach is the PDE system discovery. Usually, it is done in the vector form, and thus the model form is reduced to a single vector equation. Therefore, we made the first step to the system of independent equation discovery using multi-objective optimization.
(F) Even though the advancement from the prescribed library regression to a multi-objective independent (non-vector) system evolution may appear as a giant leap, we still on a slippery floor. We mean that all upgrades require significant concrete addition to the thin metal carcass built. However, we are on our way to making the universal equations discovery system that will allow obtaining a wide range of PDEs.
We still lack the arbitrary-form PDE solution framework that will allow us to make more sophisticated regularization. We require speed improvement. We need a built-in algebraic equation discovery system to obtain non-constant coefficients. It is still a long way. However, we have a solid ground and significant improvement.
(G) From the non-expert point of view, partial differential equations and mathematical physics overall are well-studied areas. Regretfully, it is not an actual state-of-the-art. The amount of known models is very restricted. Second Newton's law-related, Navier-Stokes equations, Maxwell equation, Schrodinger equation, Elastic media equations (meaning Hooke's law and related equations), General relativistic gravitation equations, and possibly a few more are an almost exhausting list of the known models
The appealing possibility of extracting new equations just from data and discovering new terms for the known models spreads more and more. Thus, we made a little step to computer-aid PDE discovery, which should inspire experts in different areas to use the help of the computer to enrich the applied areas with the mathematical models. We want to shift from the equation-free approach where the known equation's terms restrict the possible model form to the completely agnostic-form-based discovery.
Nevertheless, it seems a good idea to add a knowledge base in the future, however, in a less explicit form. Meaning that the classification of the equation's type will introduce additional information for the expert and introduce another steering mechanism for the discovery process. As an example, for some applications, only linear equations may be used. But for now, we want to free the computer from any a priori knowledge to filter the equations in the future. In our opinion, such an approach is more productive and complete than one that restricts the search from the beginning.
7. a full citation of the paper (that is, author names; title, publication date; name of journal, conference, or book in which article appeared; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable);
[1] Hvatov, A., & Maslyaev, M. (2020, July). The data-driven physical-based equations discovery using evolutionary approach. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (pp. 129-130).
[2] Maslyaev, M., Hvatov, A., & Kalyuzhnaya, A. V. (2021). Partial differential equations discovery with EPDE framework: application for real and synthetic data. Journal of Computational Science, 101345.
[3] Hvatov, A., & Maslyaev, M. Multi-Objective Discovery of PDE Systems Using Evolutionary Approach (CEC-2021 conference, unconditionally accepted, in press)
8. a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors;
Any prize money, if any, is to be divided equally among all co-authors AH, MM and AK.
9. a statement stating why the authors expect that their entry would be the "best"
The differential equations discovery field is, in our opinion, undeservedly avoided by most researchers. We hope that our work at least will inspire researchers around the world to make their solutions.
Partial differential equations are understandable and interpretable, unlike neural networks. Thus, it may be considered as the alternative to the classical machine learning models.
The promising unusual friendship between the computer and the chalkboard mathematics is, in our opinion, better usage of computing power than most machine learning applications. Therefore, we hope that more and more scientists will be using a computer as a transparent and handy research instrument in the bright future without fear of black boxes.
10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GI (genetic improvement), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.
GP
11. The date of publication of each paper. If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is "in press" by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement.
[1] 2020, July
[2] 26 March 2021
[3] Conference date: June 29,2021 - July 1, 2021