1. Title of the Publication

CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives

2.  Author Information
>> the name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s);

Joel Kuepper; School of Computer Science, The University of Adelaide, 5005 Adelaide, Australia; joel.kuepper@adelaide.edu.au; +61 8 8313 4792
Andres Erbsen; 446 Highland Ave, Malden, MA 02148, USA; andreser@mit.edu; +1 774 316 2048
Jason Gross; 2055 Center St #519, Berkeley, CA 94704; jgross@mit.edu; +1 631-790-8963
Owen Conoly; 17821 52nd St NE, Baldwin, ND 58521, USA; owenc@mit.edu; +1 701-516-7156
Chuyue Sun; Unit 4416, 65 E Monroe St, Chicago, IL 60603; chuyues@stanford.edu 
Samuel Tian; swtian@mit.edu 
David Wu; david.wu@adelaide.edu.au 
Adam Chlipala; adamc@csail.mit.edu 
Chitchanok Chuengsatiansup; c.chuengsatiansup@unimelb.edu.au 
Daniel Genkin; genkin@gatech.edu 
Markus Wagner; Faculty of IT, Office 242, Woodside Building, Monash University, Clayton, VIC 3171, Australia; markus.wagner@monash.edu; +61 3 9902 6011
Yuval Yarom; yuval.yarom@rub.de 


3. Corresponding Author

Joel Kuepper
joel.kuepper@adelaide.edu.au

4. Paper Abstract

Most software domains rely on compilers to translate high-level code to
multiple different machine languages, with performance not too much worse than
what developers would have the patience to write directly in assembly language.
However, cryptography has been an exception, where many performance-critical
routines have been written directly in assembly (sometimes through
metaprogramming layers). Some past work has shown how to do formal verification
of that assembly, and other work has shown how to generate C code automatically
along with formal proof, but with consequent performance penalties vs. the
best-known assembly. We present CryptOpt, the first compilation pipeline that
specializes high-level cryptographic functional programs into assembly code
significantly faster than what GCC or Clang produce, with mechanized proof (in
Coq) whose final theorem statement mentions little beyond the input functional
program and the operational semantics of x86-64 assembly. On the optimization
side, we apply randomized search through the space of assembly programs, with
repeated automatic benchmarking on target CPUs. On the formal-verification
side, we connect to the Fiat Cryptography framework (which translates
functional programs into C-like IR code) and extend it with a new formally
verified program-equivalence checker, incorporating a modest subset of known
features of SMT solvers and symbolic-execution engines. The overall prototype
is quite practical, e.g. producing new fastest-known implementations of
finite-field arithmetic for both Curve25519 (part of the TLS standard) and the
Bitcoin elliptic curve secp256k1 for the Intel 12th and 13th generations.

5. Criteria that the author claims that the work satisfies;

(D) The result is publishable in its own right as a new scientific result
independent of the fact that the result was mechanically created.

(E) The result is equal to or better than the most recent human-created
solution to a long-standing problem for which there has been a succession of
increasingly better human-created solutions.

(F) The result is equal to or better than a result that was considered an
achievement in its field at the time it was first discovered.

(G) The result solves a problem of indisputable difficulty in its field.

6. Statement Why the Results Satisfy the Criteria

It is a bit hard to distinguish between the criteria, as it solves a meta problem.
To set the scene:

Searching the Cryptology ePrint Archive for terms like "speed", "records",
"faster", "implementation" yields hundreds to thousands of results; e.g.
Searching for "speed record" [1] yields 1160 results. The techniques employed
to achieve faster and faster implementations span from conceptual changes to
tweaking single instructions. The reason for this plethora of research is that
there are many different challenges to address, including the trading off
between competing goals. Consequently, in order to make cryptography
ubiquitously feasible, as for example in SSL libraries such as OpenSSL and
BoringSSL, core components of many cryptographic implementations are typically
written manually by domain experts to achieve best-performing results.

First, we see that speed records are often publishable on its own (D),
regardless of how the record was obtained (c.f. ePrint Archive above)

Second, our submission satisfies (E,F) in the sense that every increasingly
better solution is outperformed by the next publication. 

Third, the breadth of existing research in the area underlines the indisputable
difficulty (c.f. (G)) of generating fast and correct cryptographic
implementations.

Lastly, in regulated, known benchmarks like SUPERCOP [2], the results that we
generate automatically with CryptOpt are on par and sometimes outperform human
written computer programs (c.f. Section 7 of our paper); this would meet
criterion (H), however, as our comparison is with the current best SUPERCOP
results and not in a "competition-like event", we do not list (H) above as an
officially met criteria.

[1] https://eprint.iacr.org/search?q=speed+record
[2] https://bench.cr.yp.to/supercop.html
 
7. Full Citation
Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian,
David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus
Wagner, and Yuval Yarom; CryptOpt: Verified compilation with randomized program
search for cryptographic primitives, 2023; To appear in PLDI, 2023. doi:
10.1145/3591272. arXiv preprint arXiv: 2211:10665

8. Prize Money Breakdown
Prize money, if any, will be divided equally among those co-authors who wish to
receive an equal share.  Clarified on 6/30/23 to specify equal shares for 
Kuepper, Erbsen, Gross, Conoly, and Sun, at the addresses given above.

9. A Statement Indicating Why this Entry Could Be the "Best"

As the paper shows, with our tool we can generate competitively fast and
verified code by using a set of perhaps surprisingly simple techniques from the
domain of Search-Based Software Engineering (SBSE). Using two examples, we aim
to show that this project is not only a proof-of-concept, but also reusable and
useful.

First, during the submission process to PLDI, the Research Artifact has been
evaluated independently thrice and the claimed results have been reproduced;
the artifact is publicly available [3] and the paper will carry the "reusable
badge".

Second, picking one of the results from the paper, we would like to highlight
that we are able to generate assembly code for core primitives that Bitcoin
uses for its block signature. Our assembly code is faster than the existing
hand-written code, now formally verified correct and even outperforms the
fastest known C code. We have reached out to the core maintainers of the
Bitcoin library to merge this newly generated code. The concept has been
acknowledged (see [4]) and we are now in the engineering works of getting it
included into the library. With Bitcoin being used for hundreds of thousands of
transactions each day (see [5]), the integration of automatically generated,
human-competitive code has the potential to create substantial press coverage
and research interest.

To conclude, we hope that CryptOpt can be employed in a myriad of projects
where it has (so far) been too cumbersome or simply too expensive to write fast
code by hand. ***

[*** As only one example of a potential target project, see BLS12-381 [6]:
here, the Multiply function was written by hand, but the Square function simply
uses the hand-written multiplication (with the same parameter twice) instead of
using a custom, hand-written implementation.]

[3] https://zenodo.org/record/7710435
[4] https://github.com/bitcoin-core/secp256k1/issues/1261#issuecomment-1517933216
[5] https://ycharts.com/indicators/bitcoin_transactions_per_day
[6] https://github.com/relic-toolkit/relic/blob/main/src/low/x64-asm-6l/relic_fp_sqr_low.c#L43

10. Evolutionary Computation Type
Genetic Improvement

11. Publication Date
The publication has been unconditionally accepted at PLDI 2023, as can be seen
on the website
https://pldi23.sigplan.org/details/pldi-2023-pldi/53/CryptOpt-Verified-Compilation-with-Randomized-Program-Search-for-Cryptographic-Primi
PLDI will take place in June 2023 in Orlando, Florida, USA.