Using Z3 to invert non-cryptographic hashes
A few months back I was reading some Java code using MurmurHash3. This algorithm is quite commonly used for quick hashing, examples where it’s used:
At 28c3 in 2007 Alexander Klink and Julian Wälde did a talk that I really liked called “Efficient Denial of Service Attacks on Web Application Platforms” (video)
This talk was deeper than the content I’m about to discuss as this is about the issues with non-cryptographic hash functions not being fully mixed (and in many cases invertible) such that you can “flood” a hash bucket and the lookup which would normally be an O(1) operation decends to that of a linked list - O(n).
After this talk there was a fever in open source land to fix these hashing issues in Ruby, PHP, Java etc. Allegedly the only language that knew of this already was Perl, and they’d kept the details to themselves since 2002 😂
The one that still puzzles me is Microsoft’s .NET framework (the lanuage I’m most familiar with the internals of), they use something that was patented by Niels Fergurson (who I greatly respect), it’s distribution allegedly doesn’t suffer the issues in the above talk, yet the hash appears to be invertible, so how can this be so?
It’s name is
Marvin (github link)
Back on Track
Would you want to use MurmurHash3 for cryptographic hashing?!
Obviously this is a rhetorical question, why would anyone use an invertible hash function for cryptographic purposes 🤔
‡ which he notes is in “public domain” to allow people to use or license as they see fit”, he’s clearly not to blame if someone uses it for cryptographic purposes, he never stated that anywhere
Final Thoughts and Future Work
As the size of the input array increases the collisions also do, so you can add hints based on known structure of the plaintext, or you can iteratively exclude inputs that collide to the same hash until you get a meaningful input. I’m not sure on how coupled the values are that are chosen, or whether the relationship of hints to length is linear or otherwise. But I’d like to look into that.
I’d also like to tidy up some of the dynamic name assignment, potentially as a helper python library so that it can be used to tackle other solver problems.
SipHash: a fast short-input PRF - this expands on the 28c3 work and also provides a solution to this problem which is to use a cryptographic hash function that is faster than any of the incumbent, non-cryptographic ones.
Breaking Murmur: Hash-flooding DoS Reloaded - Martin Boßlet’s extensive coverage on the topic of hash-flooding.