The Shocking Disclosure After GitHub Scanned!

GitHub Scanned

The Shocking Disclosure After GitHub Scanned!

A group of researchers from the University of North Carolina (NCSU) conducted a study of the service for hosting IT projects and their joint development of GitHub. Experts have found that over 100 thousand GitHub repositories contain API keys, tokens, and cryptographic keys.

The problem of unintentional leakage of critical information (encryption keys, tokens and API keys from various online services) has long been one of the hottest topics.

Because of such leaks have already occurred several major incidents with personal data:

  • Uber,
  • DJI,
  • DXC Technologies, etc.

From October 31, 2017 to April 20, 2018, researchers from NCSU scanned 4,394,476 files in 681,784 repositories via the GitHub search API and 2,312,763,353 files in 3,374,973 repositories previously collected in the Google BigQuery database.

In the process of scanning, experts searched for strings that would fall under the patterns of API keys (Stripe, MailChimp, YouTube, etc.), tokens (Amazon MWS, PayPal Braintree, Amazon AWS, etc.) or cryptographic keys (RSA, PGP, etc.).

In total, experts found about 575,476 tokens, API- and cryptographic keys, and 201,642 of them were unique. 93.58% of the finds were related to accounts with one owner.

When manually checking part of the selected results, AWS credentials were found for the site of a large government department in a Western European country and for a server with millions of applications for admission to an American college.

The study revealed an interesting trend – if the data owners detected a leak, then 19% of the data monitored by experts were deleted (as “deleted”, see below) within 16 days (12% of them during the first day) and 81% were not removed during the observation period.

The most interesting thing is that all the “deleted” data that the researchers observed were not actually removed physically, and their owners simply made a new commit.

At the end of last year, I wrote a short note in which I explained how using a DLP solution to prevent unintended leaks by monitoring the data uploaded to GitHub.

Regular news about individual cases of data leakage, promptly published on the information leakage channel.