Hackers Are Passing Around a Megaleak of 2.2 Billion Records

The so-called Collections #1–5 represent a gargantuan, patched-together Frankenstein of rotting personal data.
Alyssa Foote

When hackers breached companies like Dropbox and LinkedIn in recent years—stealing 71 million and 117 million passwords, respectively—they at least had the decency to exploit those stolen credentials in secret, or sell them for thousands of dollars on the dark web. Now, it seems, someone has cobbled together those breached databases and many more into a gargantuan, unprecedented collection of 2.2 billion unique usernames and associated passwords and is freely distributing them on hacker forums and torrents, throwing out the private data of a significant fraction of humanity like last year's phone book.

Earlier this month, security researcher Troy Hunt identified the first tranche of that mega-dump, named Collection #1 by its anonymous creator, a patched-together set of breached databases Hunt said represented 773 million unique usernames and passwords. Now other researchers have obtained and analyzed an additional vast database called Collections #2–5, which amounts to 845 gigabytes of stolen data and 25 billion records in all. After accounting for duplicates, analysts at the Hasso Plattner Institute in Potsdam, Germany, found that the total haul represents close to three times the Collection #1 batch.

"This is the biggest collection of breaches we’ve ever seen," says Chris Rouland, a cybersecurity researcher and founder of the IoT security firm Phosphorus.io, who pulled Collections #1–5 in recent days from torrented files. He says the collection has already circulated widely among the hacker underground: He could see that the tracker file he downloaded was being "seeded" by more than 130 people who possessed the data dump, and that it had already been downloaded more than 1,000 times. "It's an unprecedented amount of information and credentials that will eventually get out into the public domain," Rouland says.

Size Over Substance

Despite its unthinkable size, which was first reported by the German news site Heise.de, most of the stolen data appears to come from previous thefts, like the breaches of Yahoo, LinkedIn, and Dropbox. WIRED examined a sample of the data and confirmed that the credentials are indeed valid, but mostly represent passwords from years-old leaks.

But the leak is still significant for its quantity of privacy violation, if not its quality. WIRED asked Rouland to search for more than a dozen people's email addresses; all but a couple turned up at least one password they had used for an online service that had been hacked in recent years.

As another measure of the data's importance, Hasso Plattner Institute's researchers found that 750 million of the credentials weren't previously included in their database of leaked usernames and passwords, Info Leak Checker, and that 611 million of the credentials in Collections #2–5 weren't included in the Collection #1 data. Hasso Plattner Institute researcher David Jaeger suggests that some parts of the collection may come from the automated hacking of smaller, obscure websites to steal their password databases, which means that a significant fraction of the passwords are being leaked for the first time.

The sheer size of the collection also means it could offer a powerful tool for unskilled hackers to simply try previously leaked usernames and passwords on any public internet site in the hopes that people have reused passwords—a technique known as credential stuffing. "For the internet as a whole, this is still very impactful," Rouland says.

Rouland notes that he's in the process of reaching out to affected companies, and will also share the data with any chief information security officer that contacts him seeking to protect staff or users.

You can check for your own username in the breach using Hasso Plattner Institute's tool here, and should change the passwords for any breached sites it flags for which you haven't already. As always, don't reuse passwords, and use a password manager. (Troy Hunt's service HaveIBeenPwned offers another helpful check of whether your passwords have been compromised, though as of this writing it doesn't yet include Collections #2-5.)

Bargain Bin

Rouland speculates that the data may have been stitched together from older breaches and put up for sale, but then stolen or bought by a hacker who, perhaps to devalue an enemy's product, leaked it more broadly. The torrent tracker file he used to download the collection included a "readme" that requested downloaders "please seed for as long as possible," Rouland notes. "Someone wants this out there," he says. (The "readme" also noted that another dump of data missing from the current torrent collection might be coming soon.)

But other researchers say that such a massive database being freely shared represents something else: That enough old megabreaches of personal information have piled up in the hacker underground over the years that they can comprise a sprawling, impactful amount of personal information and yet be practically worthless.

"Probably the skilled hackers, the guys really interested in getting money from this, had it for multiple years already," says David Jaeger, a researcher at Hasso Plattner Institute who analyzed the collections. "After some time, they've tried all these on the major services, so it doesn’t make sense to keep them any longer, they sell it for a small amount of money."

Below a certain price, Jaeger adds, hackers often barter the information for other data, spreading it further and devaluing it until it's practically free. But it could still be used for smaller scale hacking, such as breaking into social media accounts, or cracking lesser-known sites. "Maybe it’s worthless for the people who originally created these data dumps, but for random hackers it can still be used for many services," Jaeger adds.

Hunt, after publishing the initial Collection #1 earlier this month, says he was surprised to find multiple people immediately offering to send him links to Collections #2-5. "What this represents that's unprecedented is the volume of data and the extent it’s circulating in big public channels," Hunt says. "It’s not the world's biggest hack, it's the fact that it’s circulating with an unprecedented fluidity."

In that sense, Collections #1-5 represent a new kind of milestone: That the rotting detritus of the internet's privacy breaches has gotten so voluminous and devalued that it's become virtually free and therefore public, degrading any last private information it might have held. "When enough people have secret data, someone shares it," Rouland says. "It's entropy. When the data is out there, it’s going to leak."


More Great WIRED Stories