Why cryptography is not based on NP-complete problems
Cryptography is not based on NP-complete problems - let me explain why.
Hard problems in cryptography
Cryptographic schemes are based on the computational difficulty of solving some ‘hard’ problem. For example, RSA (specifically, RSA-2048) is based on the difficulty of a problem like this:
I generate two large, random prime numbers, and , each of length 1024 bits. I then give you the product . Find .
The RSA cryptosystem cannot be secure unless this problem is hard1. When a cryptographer says ‘hard’, they mean something blunt: that it would take more than operations to solve. They picked this number because it’s really big, and all the computers in the world working for a very long time would not be able to perform that many operations2.
Notice that this problem is randomized. A specific instance of the problem (some concrete values of and ) could be easy or hard - for example, if , the problem would be easy. However, overall, a randomly chosen instance will be hard with overwhelming probability.
Why does the problem have to be randomized? Well, intuitively, each time we encrypt or sign something, we need a fresh instance of the problem - if we all used the same problem, that wouldn’t be good. More concretely, cryptographic schemes rely on clients holding some kind of secret key, sampled at random, which instantiates the hard problem3. For example, in the RSA problem above, and are part of the secret key, and selected randomly, so the problem must be hard for a randomly selected and .
So, to be useful for cryptography, a problem must have the property that a randomly-sampled instance of it is difficult (takes more than operations to solve) with overwhelming probability. This is an important property we will come back to.
NP-complete problems
There is a whole branch of computer science called complexity that analyzes the difficulty problems based on the asymptotic runtimes of the best-known algorithms to solve them. As we’ll see, a complexity theorist’s definition of what makes a problem hard is different than a cryptographer’s.
I quit the complexity class in college just before the add/drop deadline; the following explanation will be clumsy as a result.
There is a class of problems called NP-complete problems that are:
- ‘Hard’ to solve quickly,
- ‘Easy’ to verify a solution for, and
- Sort of ’equivalent’ to each other.
Examples include the knapsack problem, subset sum problem, and the 3-color graph coloring problem. To a complexity theorist, easy means ‘solvable in polynomial time’. That is, if the problem is of size , then the polynomial-time algorithm to solve it has a runtime of for some constant value . Similarly, to a complexity theorist, a hard problem is one for which no polynomial-time algorithm exists (for example, the best algorithms for these sometimes have an exponential runtime, of ).
Let’s look at an example: say you are given a map with countries on it, and asked to assign one of three colors to each of the countries, such that no countries that are touching have the same color. Here’s an example map, colored with a solution:
This is an NP-complete problem, so there is no algorithm that can solve any instance of the problem in polynomial time (i.e. for some constant , given a map with countries)4. If I gave you a candidate solution to the problem (a coloring of the map), you could check it easily in polynomial time, using breadth-first search.
So NP-complete problems seem like the perfect ‘hard problem’ for cryptography - no efficient solution is possible! Why not make a cryptosystem based on 3-color map coloring?
You might think that the problem with using an NP-complete problem for cryptography is that we only know that its asymptotic runtime is bad (say, exponential as in ), but for cryptography, we need a concrete number of operations (). Perhaps the big O notation is hiding a very small constant?
In practice, running an exponential-time algorithm on a reasonably-sized problem often does, in fact, require more than operations, so this isn’t the main issue. I explain the real issue below.
Average-case hardness
The main issue is that, in cryptography, we need any random instance of the problem to be difficult. Remember that the RSA problem starts with choosing a random and , so for it to be secure, the problem must be hard for randomly selected and . This is called average-case hardness.
Unfortunately, a problem being NP-complete just means that there is no polynomial-time algorithm that can solve all instances of the problem. NP-completeness doesn’t imply anything about whether a random instance of the problem is hard. For example, it could be possible that, if we generated a random map in some way, 99% percent of the time, the three-color map coloring problem would be easy, but 1% of the time, it would be very difficult.
This would still be NP-complete, but would make for a terrible cryptosystem - a scheme based on this problem would be easily broken 99% of the time!
Instead, cryptography needs problems that are hard in the average case, like the RSA problem, the discrete logarithm for elliptic curves, and the shortest vector problem for lattices. We don’t technically know whether these are NP-complete, but we are pretty confident that a random instance of any of them is ‘hard’ (i.e. would take operations to solve) with overwhelming probability.
Conclusion
So, cryptography is not based on solving NP-complete problems, which are problems that are hard to always solve efficiently (problems that are hard in the ‘worst case’). Instead, it’s based on solving problems that are concretely hard to solve a random instance of (problems that are hard in the ‘average case’). These are sort of orthogonal notions of hardness - one does not imply the other.
There’s one footnote to add here: there are some problems based on lattices that connect worst-case and average-case hardness. Basically, for some lattice problems, there are proofs that an efficient algorithm that can solve the problem in the average case actually implies an algorithm for solving the problem in the worst case. These reductions have not been enough to obtain an encryption scheme based on the worst-case hardness of an NP-complete problem5.
-
The true RSA problem is a little bit more complicated, but this is sort of the gist of it. ↩︎
-
is sort of an arbitrary number that has come to be standard. It’s large enough to have a significant safety margin, and it’s a nice, round power of two. ↩︎
-
Cryptographic schemes do this because they rely on problems where knowledge of the secret key makes the underlying problem easy. This behavior is called a trapdoor. ↩︎
-
I am assuming , since my livelihood depends on it. We’re in a sad world otherwise. ↩︎
-
For example, GapSVP is NP-hard (under a randomized reduction) for some parameter settings, but not the the ones used in cryptographic schemes. ↩︎