The Million-Key Question – Investigating the Origins of RSA Public Keys [Usenix Sec 2016, Best Paper Award]

Authors: Petr Svenda, Matus Nemec, Peter Sekan, Rudolf Kvasnovsky, David Formanek, David Komarek and Vashek Matyas

---> Try online classification tool!

Primary contact: Petr Svenda

Abstract: Can bits of an RSA public key leak information about design and implementation choices such as the prime generation algorithm? We analysed over 60 million freshly generated key pairs from 22 open- and closed-source libraries and from 16 different smartcards, revealing significant leakage. The bias introduced by different choices is sufficiently large to classify a probable library or smartcard with high accuracy based only on the values of public keys. Such a classification can be used to decrease the anonymity set of users of anonymous mailers or operators of linked Tor hidden services, to quickly detect keys from the same vulnerable library or to verify a claim of use of secure hardware by a remote party. The classification of the key origins of more than 10 million RSA-based IPv4 TLS keys and 1.4 million PGP keys also provides an independent estimation of the libraries that are most commonly used to generate the keys found on the Internet. Our broad inspection provides a sanity check and deep insight regarding which of the recommendations for RSA key pair generation are followed in practice, including closed-source libraries and smartcards.

Bibtex (regular paper)

   author = {Petr Svenda \and Matus Nemec \and Peter Sekan \and Rudolf Kvasnovsky \and David Formanek \and David Komarek \and Vashek Matyas},
   title = {The Million-Key Question – Investigating the Origins of RSA Public Keys},
   booktitle = {The 25th USENIX Security Symposium (UsenixSec'2016)},
   year = {2016},
   pages = {893--910},
   isbn = {978-1-931971-32-4},
   publisher = {USENIX}

Bibtex (technical report)

   author = {Petr Svenda \and Matus Nemec \and Peter Sekan \and Rudolf Kvasnovsky \and David Formanek \and David Komarek \and Vashek Matyas},
   title = {The Million-Key Question – Investigating the Origins of RSA Public Keys},
   booktitle = {FI MU Report Series, FIMU-RS-2016-03},
   year = {2016},
   pages = {1--83},
   publisher = {Masaryk University}

Q: So what did you do?

A: We figured out that RSA public key is leaking info about the library which created it. Hence we can tell which library you used to generate your key - based on public key only.

Q: Is single key enough to identify source library?

A: Sometimes yes, but mostly no. If you have 5 keys from the same source, it will be quite accurate. Try our automatic tool at

Q: Can I mutually distinguish all libraries?

A: Not always. Source libraries introducing exactly same bias to the value of generated public moduli will be indistinguishable.

Q: Can I also identify the version of used library?

A: Sometimes. The new version of a library that did not change source code of key generation method will not be distinguishable from the older one. E.g., OpenSSL 1.0.2f is not distinguishable from OpenSSL 1.0.2g, but OpenSSL 1.0.2g is distinguishable from OpenSSL 2.0.12 FIPS.

Q: Have you tested all libraries of the world?

A: No. We tested a lot of them, but not all. We also did not test all possible version of given library. We are also missing hardware sources such as SSL accelerators (contact us please, if you have one and like to contribute).

Q: How quickly will be the information leakage vulnerability you found fixed?

A: Probably not soon. The fix would require changing code of key generation method for the most libraries. And developers don't like to mess with that part of crypto too often. Even if fixed in the new version, a lot of old legacy libraries will be used for a long time.

Q: So how can I protect my key(s)?

A: If you need just one key, it is easy - just generate 5 keys instead of one, let all be classified by our tool ( and then keep the one which is classified with the least accuracy. If you need more keys to keep, it is slightly more tricky, but still can be done (with more keys generated and discarded).

Q: Are the data you gathered and used publicly available?

A: Definitely! Download everything in the datasets section and try your own analysis. Please don't forget to cite our Usenix paper if you will use it.

Q: I want to know more details!

A: Great, then read original paper and technical report for even more details.

  • Classification matrix (2016-07-16)
    • Download precomputed classification matrix: csv, xlsx
      • Contains 512 rows with mapping from Mask value to probability vector for Group I to Group XIII ()
      • Mask value (first column) is computed as: 2nd-7th most significant bit of modulus | 2nd least significant bit of modulus | modulus mod 3 | modulus_length_in_bits mod 2
      • Probability for given group is given in percentage. If a group never produces modulus with particular mask value, sign '-' is listed instead.