Toolpile

Articles

Hashing in 2026: when MD5 is fine, when it's a vulnerability

MD5 is broken for cryptography. It's still fine for a lot of things, and it's been fine for those things since the 1990s.

· 7 min read · By Umur Yavuz

Two things are true at once. MD5 is cryptographically broken — has been since 2004, completely shattered by 2008. And MD5 is fine for an enormous number of things people use it for, including some you'd be surprised by. Both statements are true. The skill is knowing which job you're doing.

What a hash function actually does

A hash function takes any input and produces a fixed-size output that looks random. Same input, same output, every time. Different input, different output, almost always. The two properties that matter:

  • Determinism: hash("hello") returns the same value on your laptop, my laptop, and a server in Frankfurt.
  • Avalanche: hash("hello") and hash("hellp") look completely unrelated. One bit changes the input, half the output bits flip.

What separates a cryptographic hash from a regular one is a third property: collision resistance. It must be computationally infeasible to find two different inputs that hash to the same output. This is the property MD5 has lost.

Why MD5 is broken — and what "broken" means

In 2004, Wang and Yu published a method for finding MD5 collisions in hours of computation. By 2008 a research team produced two valid X.509 certificates with the same MD5 hash, demonstrating you could forge a certificate authority's signature if it used MD5. Today you can find a collision on a laptop in seconds, and a chosen-prefix collision (where you pick the start of both inputs) in minutes.

But notice what "broken" means here. It means: I can produce two inputs that share an MD5 hash. It does not mean: given an MD5 hash, I can recover the original input. That's a different problem (preimage resistance), and MD5 still has it. Reversing an MD5 hash is still computationally infeasible.

The distinction matters because most uses of hashing don't care about collision resistance.

Where MD5 is fine in 2026

  • File deduplication. You have 200,000 photos and want to find duplicates. MD5 each file, group by hash. Two genuinely-different files with the same MD5 would require a collision attack against your specific files, which nobody is mounting against your photo library.
  • Cache keys. Hash a long URL or a complex query into a fixed-length key for memcached. Even if a collision happened, the worst case is one user gets another user's cached page, and you've already designed around that for non-personalised cache entries.
  • Content-addressable storage where the contents are not adversarially controlled. Git uses SHA-1, also broken in the same way as MD5, and Git is fine because your developers are not trying to forge commits against each other.
  • ETags for HTTP caching. The browser sends "give me /file if its ETag isn't xyz" and the server compares hashes. A collision means the browser thinks it has the latest version when it doesn't — a bug, not a breach.
  • Database partition keys, sharding, bloom filters, hash tables. All uses where you want a fast, deterministic, well-distributed integer from arbitrary input. MD5 is overkill for these (FNV or xxHash are faster) but not insecure.
Tool · Hash Generator
Generates MD5, SHA-1, SHA-256, SHA-512. The right output depends on the job.

Where MD5 is a vulnerability

Anywhere an attacker can submit input and benefit from finding a collision. Specifically:

  • Digital signatures over documents. If you sign "contract A" with MD5 and the attacker has crafted "contract A" and "contract B" with the same MD5, your signature now verifies on contract B too. Use SHA-256 or SHA-3.
  • Certificate authority signing. Already deprecated everywhere; included for completeness.
  • Anything labeled "checksum" where the integrity guarantee is supposed to defend against a malicious party, not a flipped bit on a noisy network. A download checksum on a vendor website, where the attacker controls both the file and the checksum text, is a special case — they could match anything.
  • Anything that produces an integrity proof bound into a token. JWTs use SHA-256-based HMACs for exactly this reason.

The password hashing distinction

Storing user passwords is a separate category from "hashing" as discussed above. You don't want a fast hash for passwords. You want a deliberately slow hash with a configurable cost factor — bcrypt, scrypt, Argon2id. The reason: an attacker who steals your password database wants to brute-force it. Fast hashes (MD5, SHA-256) let them try billions of guesses per second on a GPU. Argon2id deliberately costs 100ms per hash and uses a megabyte of memory, dropping that to thousands of guesses per second.

If you read "we hash passwords with SHA-256" in a security audit, that's still a finding. SHA-256 is not broken, but using it for passwords is the wrong tool. Use Argon2id with sensible parameters (m=64MB, t=3, p=4 is a 2026 starting point) or bcrypt with cost 12.

Don't generate the password yourself with Math.random either. Use a CSPRNG (in browsers, crypto.getRandomValues) and generate enough entropy that brute-forcing is infeasible.

Tool · Password Generator
Backed by crypto.getRandomValues. Generate a password long enough that the hashing algorithm is the slow part of any attack, not the password length.

Practical recommendation by job

  • Need a fast unique key for a non-adversarial use? MD5 or xxHash, doesn't matter.
  • Need a fingerprint that nobody is trying to forge? SHA-256.
  • Need a fingerprint where someone is trying to forge it? SHA-256 with HMAC, key kept server-side.
  • Need to verify a download? SHA-256 over the file, fetched from a different origin than the file itself.
  • Need to store a password? Argon2id, parameters tuned to take 100-200 ms on your hardware.
  • Need a session token? Random bytes from a CSPRNG, then encode for transport.

MD5 isn't dead. It's been demoted from "general-purpose cryptographic hash" to "fast non-cryptographic hash that everyone happens to support" — which is a real and useful job. Knowing which job you're doing is most of the work.

Tools mentioned in this article