A Quick Look at What's inside the 1/4/2023 Twiter Leaked Data

You may have read that Twitter was hacked and hundreds of millions of user's data was stolen. In this post we'll talk about what happened, and what's in the data. This wasn't a breach in the way that most people think breach. Twitter's API had a flaw where if you provided an email address, it would reveal if that email address belonged to an account, and which account it belonged to. Someone used that to compile over 220,000,000 email addresses, and what user accounts those email addresses were tied to. Here is a (censored) look at what the data looks like:
One of the biggest questions I had was if the data contained phone numbers for user's who used that method to authenticate instead of emails, but it doesn't look like that was the case, at least in this dataset. Everything that matches the pattern of a phone number looks to be part of the user's screen name.
Hudson Rock co-founder Alon Gal pointed out on the @RockHudsonRock Twitter account that it's possible that a threat actor used the same technique to create a list of phone numbers tied to Twitter accounts as well, and that that data may surface at some point in the future.

The data was released in two files. A massive (14GB compressed, 62GB uncompressed) file with all of the data, and a much smaller file containing the same information for over 95,000 verified accounts.
Is this leak a big deal? Kind of yeah. It's not as bad as if it contained phone numbers, passwords etc. but revealing the "real" email address of a high-profile account can make it much easier to find out other information about them. For those tasked with protecting and monitoring for threats against public figures, the job just got a little tougher. I'm also sure that multiple threat actors are already taking the most popular accounts from this dataset, grabbing the email addresses associated to them, and running those against other breach datasets that do contain passwords. I have no doubt that Twitter accounts will be hacked utilizing this and other methods.

From an OSINT perspective, this data will almost assuredly help reveal individuals tied to some accounts. This is an unfortunately great reminder of how difficult it is to have good OPSEC retroactively.

If you're curious if your organization was exposed, I've posted a list of all of the email domains (everything with over two occurrences) in the data along with how many times each domain was found on Github.

Comments

Popular posts from this blog

SANS Index How To Guide with Pictures

Introducing FaviconLocator: The Eazy Button to Searching by Favicon

Automating Domain Squatting Detection with DNSTwist and Python