1. the complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result;
Collaborative feature location in models through automatic query expansion

--------------------------------------------------------------------------------------


2. the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);

Francisca Pérez
SVIT Research Group
Universidad San Jorge
Autovía A-23 Zaragoza-Huesca Km. 299
50830 Zaragoza, Spain
email: mfperez@usj.es
tel: +34 671 006 334


Jaime Font
SVIT Research Group
Universidad San Jorge
Autovía A-23 Zaragoza-Huesca Km. 299 
50830 Zaragoza, Spain
email: jfont@usj.es
tel: +34 976 060 100


Lorena Arcega
SVIT Research Group
Universidad San Jorge
Autovía A-23 Zaragoza-Huesca Km. 299 
50830 Zaragoza, Spain
email: larcega@usj.es

tel: +34 976 060 100


Carlos Cetina
SVIT Research Group
Universidad San Jorge
Autovía A-23 Zaragoza-Huesca Km. 299 
50830 Zaragoza, Spain
email: ccetina@usj.es
tel: +34 976 060 100

--------------------------------------------------------------------------------------


3. the name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition);

Francisca Pérez 

--------------------------------------------------------------------------------------


4. the abstract of the paper(s);

Collaboration with other people is a major theme in the information-seeking process. However, most existing works that address the location of features during the maintenance or evolution of software do not support collaboration, or they are focused on code as the main software artifact. Hence, collaborative feature location in models has not enjoyed much attention to date. In this work, we address this concern by proposing an approach, CoFLiM, that enables the collaboration of several domain experts in order to locate the model fragment of a target feature. CoFLiM uses the feature descriptions of the domain experts and their self-rated confidence level to automatically reformulate the relevant feature descriptions in a single query. This query guides the evolutionary algorithm of our approach that finds the model fragment of the feature being located. We evaluate CoFLiM in a real-world case study from our industrial partner. We analyze the impact of CoFLiM in terms of recall, precision, and the F-measure. Moreover, we compare the reformulation of CoFLiM with four baselines. We also perform a statistical analysis to show that the impact of the results is significant. Our results show that collaboration pays off in the location of features in models. The results also show that the self-rated confidence level can be used to locate features in models. Finally, the results show that there are no significant improvements when more than three domain experts are involved, which is relevant in those industrial contexts where the availability of domain experts is scarce.

--------------------------------------------------------------------------------------


5. a list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies;

C, E, G

--------------------------------------------------------------------------------------


6. a statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission);


(C) The result is equal to or better than a result that was placed into a database or archive of results maintained by an internationally recognized panel of scientific experts.

Software Product Lines are well known for their capability of reducing development costs and time to market while improving the quality of the software systems produced by exploiting commonalities and variability across a set of similar software products. However, one of the biggest problems that is limiting the adoption of Software Product Lines by large organizations is the upfront investment that has to be made in order to establish the Software Product Line. The initial investment pays off as more products are generated, but it can be a barrier that many organizations are not able to overcome. Hopefully, Feature Location techniques such as the ones presented in our work are the core task of the extractive approach to Software Product Lines, which capitalizes on a set of existing software products and formalizes the differences and commonalities existing among them into a Software Product Line. This greatly reduces the initial investment and facilitates the transition to a Software Product Line. The Software Product Lines of our industrial partners (produced following the techniques presented in our work) are good enough to be part of the Software Product Line Hall Of Fame (http://splc.net/fame.html), an archive of Software Product Lines serving as models of what can be achieved and how. This Hall of Fame is maintained by the community associated with the International Systems and Software Product Line Conference and their Program Board.

Therefore, we consider the results of our work (Software Product Lines) to be equal to or better than those already present in the Hall of Fame of the Software Product Line Conference community.

    
(E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions.

In the presented work, we include a baseline where we compare the results obtained by our approach and those produced by humans (domain experts that have been working with that software for years). The comparison is based on common performance metrics such as the precision, recall, and F-Measure of the results obtained. The results from our approach outperform those produced by the human domain experts by more than 15% for recall, more than 10% for precision, and more than 12% for the F-Measure.

In addition, the software assets where the features are located are too big to be manually explored. A human would take almost five days just to manually inspect all of the software elements related to one of the features (then he/she needs to identify the relevant parts). In addition, the knowledge of a single human is not always enough to identify the relevant parts; software is made as a collaborative effort, which is also the case of feature location. This requires more humans inspecting the code and discussing the relevant parts of a feature. In the case study presented in our work, the estimated time needed to perform the location of the features manually by a single human was more than 30 years.

Before our work, our industrial partners used to manually locate the features, trying to use all of the tools available for them, e.g., spreadsheets or search tools. However, those methods were not scaling properly as the complexity and amount of software increased. Our approach has replaced their previous methods.

Therefore, we consider the results produced by our feature location approach to be better than those produced by the humans.

 
(G) The result solves a problem of indisputable difficulty in its field

Feature Location is one of the cornerstones of variability management and software product lines. The establishment of such software product lines is a complex endeavor. One indicator of this is the existence of international organizations that base their business on providing solutions for the establishment and maintenance of Software Product Lines over the last 20 years, such as the case of BigLever, founded in 1999 (https://biglever.com/company/about/), and pure-systems, founded in 2001 (https://www.pure-systems.com/).

In addition, the existence of a large body of research in the field of feature location suggests that it is a non-trivial problem and that the problem is not solved. The latest reviews on the topic [1,2], show the complexity of the problem and the need for new approaches to address it.  

Therefore, we consider feature location to be a problem of indisputable difficulty and our approach addresses this problem for a specific scenario.
 
[1] Rubin, J., & Chechik, M. (2013). A survey of feature location techniques. In Domain Engineering (pp. 29-58). Springer, Berlin, Heidelberg.
[2] Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of software: Evolution and Process, 25(1), 53-95.

--------------------------------------------------------------------------------------


7. a full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable);

Francisca Pérez, Jaime Font, Lorena Arcega, Carlos Cetina; Collaborative feature location in models through automatic query expansion; Automated Software Engineering; 26(1); 161-202 (2019); https://doi.org/10.1007/s10515-019-00251-9

--------------------------------------------------------------------------------------


8. a statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors;

Prize money, if any, is to be divided equally among the co-authors.

--------------------------------------------------------------------------------------


9. a statement stating why the authors expect that their entry would be the "best," and

A Software Product Line is a "a set of software-intensive systems that share a common, managed set of features satisfying the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way” [1]. There is broad consensus that the cornerstone of Software Product Lines is the set of features that are reused across the product family. A savings of $584 million in development costs, a 2x-4x reduction in time to market, or a reduction in maintenance costs of around 60% are documented real-world examples of the benefits of Software Product Lines [2].

The benefits of Software Product Lines are very appealing, especially in the current age of digital transformation where more and more classic products (from cars to coffee machines) are run on software or augmented with software services. However, there is a big catch: the manual work of locating features in already existing products is daunting for organizations. The estimation reported in our paper for locating the reusable features of CAF [3], a rail-way company (in business since 1917), was more than 30 years of work by a single engineer. 

In the paper of our entry to the “Humies” awards, we are the first to propose a new dimension, collaboration, for the problem of feature location. Our work leverages locally crowd-sourced information to guide a genetic algorithm that locates the features of a Software Product Line. The features located by our approach are influencing and reshaping the way in which world-class industries currently produce software for their products.

The features located by our work are currently being applied in the Train Control and Management software of CAF trains. In other words, software engineers of the top-six world manufacturers of trains prefer to use the features located by our approach instead of their own previous reusable assets.

Our work is also currently being applied to the induction hob division of BSH [4], which develops induction hobs under the brands of Siemens, Bosch, Gaggenau, and Neff (among other brands). BSH is the top European manufacturer, and one of the top-three world manufacturers. BSH’s software engineers replaced their long-standing (13+ years) reusable assets by the features located with our approach. As a result, the features located by our approach are now present in the firmware of millions of induction hobs worldwide.

“Your tool has changed my life for the better“ is a memorable statement from one of BSH’s senior software engineers after tool adoption. “The tool” is the development environment where the features located with our work are assembled to produce the firmware of induction hobs. 

Our work is published in a leading journal of the software engineering community. One the reviewers stated: “The detailed explanation provided in the paper also clearly shows the need for this approach, as the manual work would otherwise be daunting for developers”. Another reviewer stated: “Collaborative feature location (i.e., taking multiple feature descriptions as input) is a new dimension to this problem”.

REVE 18 (6th International Workshop on Reverse Variability Engineering), arguably the most relevant forum for reengineering products into Software Product Lines, invited one of the authors of the paper to present the work as keynote speaker [5]. Because of this work, we were also recently invited to contribute two book chapters to a handbook for Springer on Reengineering Software Intensive Systems into Software Product Lines.

Not only do world-class industries such as CAF and BSH already fund research contracts with us to reengineer their product families using our work, BSH also plans to extend the application to the software of their factory robots. At the moment of writing, this new application for BSH is under evaluation within the H2020 program [7]. There are also other world-class industry players who have demonstrated interest in applying our work, among which is Teltronic [6], a world leader in the design and manufacture of communications systems.
 
[1] Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison-Wesley, Boston (2002)
[2] http://www.productlineengineering.com/benefits/key-benefits.html
[3] https://www.caf.net/en
[4] https://www.bsh-group.com/
[5] http://reveworkshop.github.io/2018/ 
[6] https://www.teltronic.es/en/
[7] https://ec.europa.eu/programmes/horizon2020/what-horizon-2020

--------------------------------------------------------------------------------------


10. An indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.

GA

--------------------------------------------------------------------------------------


11. The date of publication of each paper.  If the date of publication is not on or before the deadline for submission, but instead, the paper has been unconditionally accepted for publication and is “in press” by the deadline for this competition, the entry must include a copy of the documentation establishing that the paper meets the "in press" requirement.

Published online: 28 January 2019