Beyond k-anonymity: private password breach checking

February 10, 2023 · 4 minutes

You can check whether a password has appeared in a previous data breach at haveibeenpwned.com/passwords. The only issue is, of course, that you have to enter your password to find out if it’s been in a breach.

I built a tool that uses homomorphic encryption to perform these lookups completely privately. Here’s why I built it, and an explanation of how it works.

When “k-anonymity” isn’t enough

HIBP uses a simple technique called “k-anonymity” to avoid sending the full hash of your password to the service. Clients send only the first 20 bits of the hash, and the service returns all the hashes of breached passwords that begin with those bits. The (average) number of hashes returned is called “k”, and the idea is that the service only learns that your password could be one of these k values. In the HIBP case, the API generally sets k = 500. For sufficiently large k, this seems like pretty decent privacy.

As Jack Cable pointed out in this great blog post, in the real world, using an API like this can leak much more information to the server than you’d expect:

How k-anonymity can leak a surprising amount of information

Suppose you use this API to check the passwords of users at signup (as NIST actually recommends…):

I enter my bad password, “pa55w0rd”, into a signup form.

The website or my browser calls the HIBP API, sending the SHA1 prefix “5ecfa”.

The website tells me this is a compromised password, and asks me to pick a new one.

I, an information security ninja, change my password to “pa55w0rd123”.

The website or my browser calls the HIBP API, sending the SHA1 prefix “8370c”.

This password is deemed fine, and I sign up.

Now, given the first password, the server can quite simply infer the second. A script hashing every possible password within an edit distance of 3 takes minutes to find the second password on a laptop. The first password was in a public breach, so it’s not a huge leap to assume that the plaintext version of it (that is, the hash preimage) is already out there.

In that sense, the logs of a password checking service using k-anonymity could actually be quite valuable: patterns of activity in the logs might reveal which passwords are being used, and by which IP addresses.

At a higher level, because some information is getting leaked, k-anonymity requires an ongoing trust relationship with the password checking service. The logs of such a service are subject to data breaches themselves, and if the service eventually gets, say, bought by someone else, you might be exposing yourself to risk you did not originally intend.

Completely private retrievals, with homomorphic encryption

The holy grail for performing these kinds of lookups privately is called “Private Information Retrieval” (PIR); basically, it would be nice to perform the lookup and know, provably, that server has learned nothing about the password. This has been theoretically possible for some time, but until recently, it would have been really computationally expensive for the server. This kind of lookup would have taken took days in 2005, and minutes in 2017. The exciting thing is that it really does takes seconds now.

Homomorphic encryption is a special kind of cryptography that makes PIR practical. You can read more details about how it works in the explanation on this page, or for an even more technical explanation, read this blog post. The one sentence summary is: you can homomorphically encrypt a really big vector of 0’s and 1’s, and then the server can compute a dot product of these encryptions with the plaintext database; you encrypt a ‘1’ in the position of the data you want to retrieve, and the server never learns what you’re retrieving.

Google, Apple, and Microsoft all use variations of PIR to perform some scanning, though they usually make some small compromises to make things more efficient. They also have relatively large research and engineering teams that focus on this kind of privacy technology.

I built a tool that performs lookups against the ~900 million passwords in the 2021 HIBP database using homomorphic encryption; you can check it out here. Your password is encrypted under a key only your device knows before leaving it, and the server can never learn any information about your password. It’s not incredibly fast: it takes 10-20 seconds depending on your internet connection - but I think it’s still pretty cool.

If you want to just hack around using this cool new kind of encryption, I’ll plug the company I just co-founded, Blyss (YC W23), which makes using homomorphic encryption as easy as using something like Firebase or S3. There’s no snake oil; everything is open source, and our underlying scheme is in a peer-reviewed academic paper. If you’re directly interested in a private API for password breach lookup, email us.

(P.S. Passwords are of course not a great thing to begin with! The long-term solution is to replace passwords with YubiKeys / passkeys / FIDO2. I used to work at Yubico, though, so this might be a bit biased…)