====== Biased RSA private keys: Origin attribution of GCD-factorable keys [ESORICS 2020] ====== ~~NOTOC~~ \_{{fa>user}}\_\_//Authors:// Adam Janovsky, Matus Nemec, Petr Svenda, Peter Sekan and Vashek Matyas {{fa>user-circle-o}}\_//Primary contact:// Adam Janovsky %%<%%%%>%% {{fa>bullhorn}}\_//Conference:// [[https://www.surrey.ac.uk/esorics-2020/|ESORICS 2020]] \_ \_ \_ @InProceedings{2020-esorics-biasedrsaprivatekeys, Title = {Biased RSA private keys: Origin attribution of GCD-factorable keys}, Author = {Adam Janovsky and Matus Nemec and Petr Svenda and Peter Sekan and Vashek Matyas}, BookTitle = {25th European Symposium on Research in Computer Security (ESORICS) 2020}, Year = {2020}, Publisher = {Springer}, crocsweb = {https://crocs.fi.muni.cz/public/papers/privrsa_esorics20}, Keywords = {Cryptographic library, RSA factorization, Measurement, RSA key classification, Statistical model}, } In 2016, Švenda et al. (USENIX 2016, The Million-key Question) reported that the implementation choices in cryptographic libraries allow for qualified guessing about the origin of public RSA keys. We extend the technique to two new scenarios when not only public but also private keys are available for the origin attribution -- analysis of a source of GCD-factorable keys in IPv4-wide TLS scans and forensic investigation of an unknown source. We learn several representatives of the bias from the private keys to train a model on more than 150 million keys collected from 70 cryptographic libraries, hardware security modules and cryptographic smartcards. Our model not only doubles the number of distinguishable groups of libraries (compared to public keys from Švenda et al.) but also improves more than twice in accuracy w.r.t. random guessing when a single key is classified. For a forensic scenario where at least 10 keys from the same source are available, the correct origin library is correctly identified with average accuracy of 89\% compared to 4\% accuracy of a random guess. The technique was also used to identify libraries producing GCD-factorable TLS keys, showing that only three groups are the probable suspects. ===== Artifacts, tools... ===== {{fa>database}}\_//// [[https://owncloud.cesnet.cz/index.php/s/Ihhw3BKKzKTaxB9|Dataset of all collected RSA keys (39GB)]] ===== Further research ===== This research is related to our previous papers: [[https://crocs.fi.muni.cz/papers/acsac2017|Measuring Popularity of Cryptographic Libraries in Internet-Wide Scans (ACSAC 2017)]] [[https://crocs.fi.muni.cz/papers/rsa_ccs17|The Return of Coppersmith’s Attack: Practical Factorization of Widely Used RSA Moduli (CCS 2017)]] [[http://crcs.cz/rsa|The Million-Key Question – Investigating the Origins of RSA Public Keys (USENIX 2016)]] ===== Key points ===== * We investigated the properties of keys as generated by 70 cryptographic libraries, identified biased features in the primes produced, andcompared three models based on Bayes classifiers for the private key attribution. * The information available in private keys significantly increases the classification performance compared to the result achieved on public keys. Our worke nables to distinguish 26 groups of sources (compared to 13 on public keys) while increasing the accuracy more than twice w.r.t. random guessing. * Finally, we designed a method usable also for a dataset of keys where one prime is significantly correlated. Such primes are found in GCD-factorable TLS keys where one prime was generated with insufficient randomness. As a result, we can identify libraries responsible for the production of these GCD-factorable keys, showing that only three groups are a relevant source of such keys. ===== Summary video ====== {{ youtube>CXCkdmFUGwU?900x520 | Biased RSA private keys}}