A basic question about cryptography that we pretend is not there.

Victor Shoup is one of the top experts in cryptography. He is well known for many things including a soon to be released book that is joint with Dan Boneh on, what else, cryptography; and the implementation of many of the basic functions of cryptography.

Today I want to talk about my recent visit to the Simons Institute in Berkeley where I heard Victor give a special lecture.

This lecture led to an interesting question:

How do we know if a crypto function is correctly implemented?

Before I discuss this let me report that this summer the institute is running an excellent program on cryptography. It was organized by Tal Rabin, Shafi Goldwasser, and Guy Rothblum. While I just visited for a short time last week, it seems to be an incredibly well run program, with much of the thanks going to Tal. Tal added social interaction as part of the program, about which more below. The institute director, Dick Karp, told me he thought these additional social activities were wonderful and they helped make the program so enjoyable and productive.

## Victor’s Talk

Victor gave his special talk on mostly known—to experts—results in basic cryptography. His talk was well attended, well structured, and well delivered. He made three points during the talk that were striking to me:

• He wrote all of the operations used in his talk using “additive” notation. Most crypto systems use some abelian group as the base of operations, of course some use more than one. Usually these groups are written using multiplicative notation: ${x \times y}$. Victor switched this to ${x + y}$. This is of course not a major difference, but it had some nice properties. Also, for those of us who are comfortable with the usual notation, it made us have to rethink things a bit.
• He repeatedly made the comment that while many of the provable methods he presented where fast enough to be practical, they were not used. He essentially said that provable methods seemed to be avoided by real system builders. As you might image this was not a welcome comment to the group that consisted mostly of researchers who work in cryptography. Oh well.

• He also made another point: For the basic type of systems under discussion, he averred that the mathematics needed to describe and understand them was essentially high school algebra. Or as he said, “at least high school algebra outside the US.” This is not a negative point: there is nothing wrong with the systems requiring only simple operations from algebra. Deep mathematics is great, but not necessary to make a great cryptographic system.

## A Question

Kathryn Farley, a researcher in performance arts, was at Victor’s talk, since she was also in town. She could not stay for the entire 90-minute lecture—she left after about one hour for another appointment. Later that day we talked and she said she had thought about asking a question to Victor, but not being an expert in cryptography, was unsure if her question was naive or not. I asked her what it was and she replied with the question atop this post. To say it again:

How do we know if a crypto function is correctly implemented?

I told her that it was, in my opinion a terrific question, one that we often avoid. Or at best we assume it is handled by others, not by cryptographers. I immediately said that I have discussed this and related questions before here and I still am unsure what can be done to be ensure that crypto functions are correctly implemented.

Suppose that ${\mathsf{rand}()}$ is claimed to return a random number: of course it is a pseudo-random number generator (PRNG). Lets assume that it claims to use a strong PRNG method. How can we know? We can look at the code and check that it really works as claimed, but that is messy and time consuming. Worse it defeats the whole purpose of having a library of crypto functions.

A nastier example: Suppose that ${\mathsf{rand}()}$ uses the following trick. If the date is before January 1, 2016, it will use the correct strong generator. If the date is later, then it will use a simple PRNG that is easy to break. Further it makes this happen in a subtle manner, which is extremely hard to detect by code inspection. How would we discover this?

## The Tea Party

Friday at the institute was “The Tea Party.” At the end of the day all were invited to be outside near the institute’s building, and join in eating, drinking, and playing a variety of games.

The food was quite impressive—much better than the standard fare that we see at most university gatherings. It was also well attended, which probably is related to the quality of the food.

Kathryn and I were there. And it was a perfect time and place to ask some of the cryptographers about her question. We got several answers, most of which were not very satisfying. I will list the answers without directly quoting anyone—protecting the innocent, as they say.

${\bullet}$ Some said that the issue was not about crypto, but about software engineering. So it was someone else’s issue.

${\bullet}$ Some were more explicit and said that software engineers should use verification methods to check correctness.

${\bullet}$ Others said that the code should be checked carefully by making it open source: basically relying on the “crowd” for correctness.

None of these methods is foolproof. Which raises the question: how can we actually be sure that all is well?

## Some Ideas

I was not very happy with any of the answers that we got. I would suggest that this correctness problem is special and because it is about crypto systems it may have approaches that work that do not work for arbitrary code.

Here are some ideas that perhaps can be used: I think they are just examples of what we may be able to do, and welcome better ideas.

Property Testing:

Many crypto functions satisfy nontrivial mathematical properties, which is quite different from arbitrary code. Consider RSA. One could check that

$\displaystyle \mathsf{Encode}(xy) = \mathsf{Encode}(x)\mathsf{Encode}(y).$

And the same with the decode function. This doe not imply correctness, but a failure would mean the code is incorrect.

${N}$ Version Testing:

If the code is critical one could in principle implement the function twice, or more times. This is best done by different vendors. Then the outputs would be checked to see if they are identical. Note since crypto functions usually operate over finite fields or rings, there is no rounding errors. Thus, the different implementations should get identical values.

Note this is a standard idea in software engineering—see this. The key assumption is that different versions should have different errors. This appears not to be the case in practice, but the method still has something behind it. Wikipedia says: ${N}$-version programming has been applied to software in switching trains, performing flight control computations on modern airliners, electronic voting (the SAVE System), and the detection of zero-day exploits, among other uses.

Check No State:

A crypto function should be a function and have no state. The date “attack” I gave earlier shows that functions should be forced to not be able to create any state. I believe that it may be possible to “sandbox” the code of a alleged function so that it cannot keep any state. This obviously makes the date trick impossible.

## Open Problems

How do we be sure that crypto functions are correctly implemented? I suggest simply that cryptographers should start to view this issue as another type of attack that can be used against their system. Perhaps viewed this way will lead to new ideas and better and more secure systems.

August 14, 2015 4:47 pm

Could you run some tests (e.g. DIEHARD) to verify that the output is properly random? You might then run it on a wide variety of inputs.

Of course, if your generator tends to fail on specific input/dates, this won’t check that, but that’s the problem with all “proofs of a negative”.

(As an aside, a common interview question at the hedge fund I used to work at was to prove a given random generator was really random. My thought was to gzip up the results; that would be an indication of its entropy, though of course it could be non-entropic in ways gzip couldn’t compress).

August 15, 2015 9:49 am

if we have N different implementations of a program and at least one is correct is it clear that the most efficient way of getting the correct output (or at least not getting the wrong one) requires running all N implementations separately? Can something more efficient be done at least for specific programs of interest?

August 15, 2015 11:18 am

This is an interesting thought. The original use of N-version was often for realtime stuff so they had to run all at once. But if did a few random versions and all agreed then…

Cool idea.

August 15, 2015 10:25 am

This is where Zero Knowledge (ZK) proofs, and more generally the compilation of protocols to be secure against malicious adversaries, should come into play. If the implementation has to prove in ZK that it meets the required specification then we are convinced that it satisfies the correct input-output functionality and moreover this does not come at the the expense of security (if imperfect randomness,is a concern then use a coin-tossing protocol). Note that once the implementation is forced to use ZK proofs then there is no need to even look at the source code (at least not for the sake of correctness). Everybody can simply write their own ZK verifier to check the program (this is also very useful debugging tool, by the way). You only need to trust your own implementation of the verifier!

Somehow, even though ZK proofs have been around for over thirty years, and have certainly influenced how cryptographic security is defined (via simulation), it still seems that their potential has not been fully realized by practitioners. One possible explanation is that they require a paradigm shift from verifying the actual source code to verifying the functionality (though to be fair this is not unlike test driven development).

Another relevant concept is that of a “combiner”, which enables you to combine several implementations of a cryptographic primitive in a way that guarantees security as long as some of the implementations are correct and secure. See for instance:

– Amir Herzberg: On Tolerant Cryptographic Constructions. CT-RSA 2005: 172-190 (and also https://eprint.iacr.org/2002/135.pdf)

– Danny Harnik, Joe Kilian, Moni Naor, Omer Reingold, Alon Rosen: On Robust Combiners for Oblivious Transfer and Other Primitives. EUROCRYPT 2005: 96-113

4. August 15, 2015 10:32 am

One could in principle use a proof assistant such as Coq to develop a program and prove it’s correctness at the same time. I believe there are a number of projects on formal proofs and cryptography around these days.

5. August 15, 2015 7:13 pm

This is exactly what Galois SAW is for:

https://galois.com/project/software-analysis-workbench/

6. August 16, 2015 9:30 am

Proving implementations correct is a goal of the EasyCrypt project:

https://www.easycrypt.info/trac/wiki/Team

There are other efforts, which I am less familiar with.

• August 24, 2015 6:39 am

Hi all, Justin,

Some more pointers for projects that focus on proving that a given implementation is correct with respect to a given _cryptographic_ specification (I am underlining “cryptographic” because there are lots of projects whose goal is to prove functional correctness for non-cryptographic software, the difficulty with crypto coming from the probabilistic behaviours it includes.)

As Justin mentioned, the EasyCrypt team and some collaborators (notably Barbosa and Almeida, from U. Porto and U. Minho, respectively), are focusing on using general-purpose C verifiers (say, Frama-C, for example), to prove that C implementations are memory-safe and correct with respect to specifications of the relevant crypto algorithms. Using CompCert (a C compiler that is certified to produce assembly code equivalent to the C source), we get for free that the compiled code is functionally equivalent to the original specification. In addition to this, we are also attempting to capture some side-channels in the analysis, and to show that the resulting assembly does not have any side-channel weaknesses.

Beringer and Appel (and their team), at Princeton, use a similar approach for the verification of assembly programs, but rely entirely on the Coq proof assistant in which CompCert is implemented. This tightens the formal result slightly (because they can guarantee that the C semantics used by the verifier and those used by the compiler are the same), but greatly increases the amount of effort involved in the correctness proof. They do not analyse side-channel security.

Bernstein and Schwabe (and collaborators) are also working on proving functional correctness of assembly implementation. Their first effort (which involved using SAT and Coq to obtain a highly non-automated proof of functional correctness for an implementation of elliptic curve operations. They recently demonstrated a tool relying on computer-algebra systems that greatly automates this process. However, it currently works on C code (so the compiler can still screw things up) and is very specialized. They are also concerned with side-channel attacks, but do not treat them as formally as functional correctness.

Another approach entirely is to extract code from specifications. I guess Galois’s Cryptol language can do this (extracting to C). Blanchet and Cade produce Ocaml code (in particular, they produced an implementation of SSH from a CryptoVerif specification where everything that could be proved secure had been proved secure). The drawback here is that this produces fresh implementations rather than proving properties of code that is already in use.

Finally, there are some projects that aim at proving provable security properties of code directly. This includes the miTLS project (which finds more attacks than proof principles), and some prototype tools that extract specifications from code in order to analyze them. Often, these approaches require the implementation to be well-structured and generally do not apply to existing code either.

All of these are a lot more work than testing, but also provide much stronger guarantees, given a trustworthy specification of the algorithm (for example, a rigorous mathematical description). Most of them also guarantee memory-safety, preventing Heartbleed-style attacks. None of them consider the generation and use of (pseudo-)random bits and numbers, and therefore do not prevent issues in key generation (such as the 2008 Debian OpenSSL bug). This might be an interesting thing to look at next, although a lot more complex to figure out.

I’ll come back and add refs as people make requests for them.

Cheers,
Francois

• August 24, 2015 10:34 pm

Thanks very much for this comment! Incidentally, I am a Princeton’81 classmate of Andrew Appel.

7. August 16, 2015 2:33 pm

this is a complex issue. it popped up on cstheory awhile back. there is widespread evidence of failure to implement software security well (2nd recent link), and crypto is a subarea of this.

http://cstheory.stackexchange.com/questions/12579/how-to-create-mission-critical-software

https://vzn1.wordpress.com/2015/08/14/cybersec-2015-summer/

August 16, 2015 2:39 pm

The problem is actually worse than you assume.

Typically a cryptographic function comes with a set of test-vectors, input this, you get output that, and all supposedly “independent” implementations are tested against the test-vectors, rather than against the actual algorithm description.

I have yet to see any cryptographic function author claim that the test-vectors were produced any other way than by running the proof-of-concept code for the function.

In reality that means that the proof-of-concept code, bugs and all, becomes the oracle for the “correct” implementation of the function.

9. August 16, 2015 3:08 pm

The algorithm absolutely must be open source. If the keys are not sufficient to encrypt / decrypt, then it is not encryption. We are not talking about cryptanalysis here. I remember reading that the Manhattan Project had to switch to key locks when the found Richard Feynman could open any / every combination lock. He was also able to exit the perimeter undetected.

10. August 16, 2015 5:30 pm

Post-Snowden, the question is (or ought to be) on everyone’s mind in the crypto community. There’ve been a few recent papers on the subject of cryptosystems that are resistant, or can recover from, malicious implementations:
1. Bellare, Paterson, Rogaway, “Security of Symmetric Encryption against Mass Surveillance”, CRYPTO 2014, https://eprint.iacr.org/2014/438
2. Mironov and Stephens-Davidowitz, “Cryptographic Reverse Firewalls”, Eurocrypt 2015, https://eprint.iacr.org/2014/758
3. Russell, Tang, Yung, Zhou “Cliptography: Clipping the Power of Kleptographic Attacks”, https://eprint.iacr.org/2015/695

11. August 17, 2015 11:13 am

some further musings on this. there are a few different ways that crypto implementations may not be “correct”. suppose one implements RSA. a strong test of whether it is implemented correctly as far as encryption/ decryption is to run the encryption/ decryption and make sure it passes the message correctly. then there are much more subtle problems such as that ones random number generator is “weak” and “subject to attack”. this is the sort of pandoras box. think youve raised a key issue that is sometimes glossed over. have been thinking lately that maybe all CS needs a ground-up rethink wrt security. how would network protocols, operating systems & languages & coding patterns etc look different if we started with security as a core design problem? they would look significantly different than what we had now. this is an achilles heel of computer science. “information wants to be free” but cyber security requires secure systems. some of this double edged sword is playing out in the bitcoin design where one can look at recent cutting edge trends & strategies.

12. August 19, 2015 6:41 am

This is a topic I spend quite some time thinking about lately.

I can start by giving a very real example: In January a bug was found in OpenSSL that a squaring function (BN_sqr) would give wrong results in rare corner cases (1 out of 2^128 cases).
It is easy to test squaring. You could for example compare the result of squaring with the result of multiplying a number by itself. However we run into a problem here: Usually these implementations are right most of the time and wrong in very rare cornercases (else someone would’ve found the error already). In this specific example – remember 1 out of 2^128 cases – you can test as long as you like, you will practically never hit one of these cases by pure chance.

Ralf-Philipp Weinmann recently talked about this issue at the Black Hat conference and he tried to find this specific bug with a fuzzer. For those not in IT security: A fuzzer is a tool that will randomly feed an application with error-containing input to find bugs. The tool american fuzzy lop (afl) was able to find that bug.

You may wonder how that’s possible because I just said it is so unlikely to hit this bug by chance. But afl is not just a “dumb” fuzzer, it monitors an application and tries to identify inputs that are “interesting” by triggering new code paths. It then uses these inputs and modifies them further. This smart fuzzing was apparantly enough to find this bug. I’m pretty sure this will work in a lot of cases. However by definition fuzzing will never be able to proove the correctness of some code.

Now the “verification methods” answer is also interesting, but it needs to be put into context. Formal methods to proove the correctness of code are a hot research topic, but one needs to be aware that they a) require a lot of work and b) are usually very limited.

In the Black Hat talk it was also mentioned that one could try to compare two different implementations of an algorithm by a method called “symbolic execution”. However this basically only works for very limited functions. Even for something like “squaring a large number” this method fails. I’m no expert in formal verification, but I don’t think it’s realistic to “proove” something like an RSA function any time soon.

One final remark where things get really messy: Cryptographic correctness is not always just defined by a mathematical result. An issue of increasing interest are sidechannels and especially timing. That means calculating an RSA function not only should produce the correct result, it must also make sure that the time it takes to do this calculation does not reveal any information about the message or the key. Testing this is pretty much a nightmare, because it involves all kinds of assumptions about compilers, processors etc. – still this is a very real issue, there was some research presented last year about such timing leaks in the RSA implementation of Java and OpenSSL. (There was a talk about this by Sebastian Schinzel at 31C3.)

August 19, 2015 8:49 am

Why don’t people actually use provable secure systems? I always thought it was that they are too slow. Since thats not the reason… what is.

August 19, 2015 10:30 am

If RSA’s secure, then P doesn’t equal NP. So in that case at least, efficiency was prefered over provability. But the question raised here is about the correct implementation of a protocol, regardless of its being secure or not. Reminds me of the possible undecidability of the undecidability of P=NP…

August 31, 2015 10:40 pm

Have a look at this entry from this very blog!

https://rjlipton.wordpress.com/2010/03/13/breaking-provably-secure-systems/

Nature/Physics is not a mathematical system and assumptions are always made about what attackers are are not allowed to do, of course at different levels of abstraction/concreteness.

August 19, 2015 10:26 am

Crypto primitives have usually been implemented many, many times over by different people, so any error in the algorithm is quickly spotted. That’s not an issue in practice.

The main problems are:

1. Side-channel attacks. Does the code unintentionally leak information? (“does the amount of time it takes to run depend on the data?”) You will rarely see crypto code written in a language other than C or assembly for this reason. The further away the language is from the hardware, the more places there are for information to hide.

2. Composing primitives into a working system. While the primitives themselves are probably algorithmically correct, errors crop up when they are assembled together. There needs to be a more holistic approach, with researchers specifying complete systems. This is an area of active research, see: http://competitions.cr.yp.to/caesar.html

15. August 24, 2015 11:02 am

How do we be sure that crypto functions are correctly implemented?

We may not do it, by means of Rice’s Theorem, but we can do secure communication mathematically proven poly-unbreakable, using the Dead Cryptographers Society Problem (DCS), defined in http://arxiv.org/ftp/arxiv/papers/1501/1501.03872.pdf.