Measuring Popularity of Cryptographic Libraries in Internet-Wide Scans [ACSAC 2017]

Authors: Matus Nemec, Dusan Klinec, Petr Svenda, Peter Sekan and Vashek Matyas

Primary contact: Petr Svenda svenda@fi.muni.cz

Abstract: We measure the popularity of cryptographic libraries in large datasets of RSA public keys. We do so by improving a recently proposed method based on biases introduced by alternative implementations of prime selection in different cryptographic libraries. We extend the previous work by applying statistical inference to approximate a share of libraries matching an observed distribution of RSA keys in an inspected dataset (e.g., Internet-wide scan of TLS handshakes). The sensitivity of our method is sufficient to detect transient events such as a periodic insertion of keys from a specific library into Certificate Transparency logs and inconsistencies in archived datasets.

We apply the method on keys from multiple Internet-wide scans collected in years 2010 through 2017, on Certificate Transparency logs and on separate datasets for PGP keys and SSH keys. The results quantify a strong dominance of OpenSSL with more than 84% TLS keys for Alexa 1M domains, steadily increasing since the first measurement. OpenSSL is even more popular for GitHub client-side SSH keys, with a share larger than 96%. Surprisingly, new certificates inserted in Certificate Transparency logs on certain days contain more than 20% keys most likely originating from Java libraries, while TLS scans contain less than 5% of such keys.

Since the ground truth is not known, we compared our measurements with other estimates and simulated different scenarios to evaluate the accuracy of our method. To our best knowledge, this is the first accurate measurement of the popularity of cryptographic libraries not based on proxy information like web server fingerprinting, but directly on the number of observed unique keys.

Conference page: ACSAC 2017 | Paper page
Download author pre-print of the paper: pdf
Download presentation: Handout-PDF | Conference-PDF

Bibtex (regular paper)

@inproceedings{2017-acsac-nemec,
  author = {Nemec, Matus and Klinec, Dusan and Svenda, Petr and Sekan, Peter and Matyas, Vashek},
  title = {Measuring Popularity of Cryptographic Libraries in Internet-Wide Scans},
  booktitle = {Proceedings of the 33rd Annual Computer Security Applications Conference},
  series = {ACSAC 2017},
  year = {2017},
  isbn = {978-1-4503-5345-8},
  pages = {162--175},
  url = {http://doi.acm.org/10.1145/3134600.3134612},
  doi = {10.1145/3134600.3134612},
  publisher = {ACM}
}

Measurement (classification) tool: GitHub link
Dataset of all collected RSA keys (39GB)
Data processing (TLS, PGP): GitHub link
Data processing (Certificate Transparency): GitHub link

Q: What did you do?

A: We used the fact that distributions of RSA public keys generated by cryptographic libraries are slightly biased, to measure the popularity of cryptographic libraries in Internet-wide scans.

Q: Does it mean the biased RSA key generation methods are broken?

A: No, in general, the bias is not enough for key factorization. However, we did break the Infineon implementation in our recent paper The Return of Coppersmith's Attack (ROCA).

Q: What parts of an RSA public key are biased?

A: We extract an 8-bit feature vector from a public modulus N: we use the remainder of division of the modulus N modulo 3, remainder modulo 4, and the 2nd to 7th most significant bits of the modulus.

Q: What was the motivation for the measurement?

A: We developed a method for probabilistic classification of keys based on their source in our paper The Million-Key Question at USENIX Security 2016. However, we were missing an accurate estimation of library popularity and could not find any papers accomplishing that. We also needed to measure the impacts of ROCA vulnerability and this is a general method for such measurements.

Q: What libraries did you analyze? Can you tell all libraries apart?

A: You can see all the analyzed sources in the following graph. Libraries in the same Group (Group number in square brackets) produce very similar distributions. The popularity of individual Groups can be measured.

Q: Does popularity of libraries change in time?

A: Yes, for one, the number of OpenSSL keys increases significantly.

Q: I want to know the popularity of library X, why wasn't it included?

A: To suggest other sources that we can add to our analysis, please get in touch with us. If you can also provide keys generated by hardware, open-source and proprietary libraries, we will add them to the Collection of RSA keys from reference libraries.

Q: Why can't you associate a key with its source with certainty?

A: The features extracted from the keys are not unique. Different (groups of) libraries can produce keys with the same features. Only the distribution of the features differs, as illustrated here:

Q: What is the accuracy of the measurement?

A: We performed simulations to determine the accuracy. The expected error of the measurement was within 1 percentage point of the estimation (e.g., OpenSSL being estimated at 70% means that we expect it to be between 69% and 71%). The error might be larger in some cases, however the ground truth is not always known. Our estimation of ROCA vulnerable keys in a PGP dataset was at 0.10%, that is within 0.02 percentage points from the correct proportion found by a much more reliable method specific to the ROCA keys.