Using Z3 to invert non-cryptographic hashes

Intro

A few months back I was reading some Java code using MurmurHash3. This algorithm is quite commonly used for quick hashing, examples where it’s used:

Some other examples in the same category are cityhash, FNV hash etc

Related Tangent

At 28c3 in 2007 Alexander Klink and Julian Wälde did a talk that I really liked called “Efficient Denial of Service Attacks on Web Application Platforms” (video)

This talk was deeper than the content I’m about to discuss as this is about the issues with non-cryptographic hash functions not being fully mixed (and in many cases invertible) such that you can “flood” a hash bucket and the lookup which would normally be an O(1) operation decends to that of a linked list - O(n).

After this talk there was a fever in open source land to fix these hashing issues in Ruby, PHP, Java etc. Allegedly the only language that knew of this already was Perl, and they’d kept the details to themselves since 2002 😂

The one that still puzzles me is Microsoft’s .NET framework (the lanuage I’m most familiar with the internals of), they use something that was patented by Niels Fergurson (who I greatly respect), it’s distribution allegedly doesn’t suffer the issues in the above talk, yet the hash appears to be invertible, so how can this be so? It’s name is Marvin (github link)

Back on Track

Would you want to use MurmurHash3 for cryptographic hashing?!

Obviously this is a rhetorical question, why would anyone use an invertible hash function for cryptographic purposes 🤔

Most of the Java implementations I listed in the Intro reference from Yonik’s implementation . So this is the one I have analyzed with the Z3 theorem prover.

So this is the example I analyzed with Z3. At this point it could be useful to read my previous post Z3 Constraint Solver Tips as it’s useful for things like dynamic SSA in the following.

which he notes is in “public domain” to allow people to use or license as they see fit”, he’s clearly not to blame if someone uses it for cryptographic purposes, he never stated that anywhere

Final Thoughts and Future Work

As the size of the input array increases the collisions also do, so you can add hints based on known structure of the plaintext, or you can iteratively exclude inputs that collide to the same hash until you get a meaningful input. I’m not sure on how coupled the values are that are chosen, or whether the relationship of hints to length is linear or otherwise. But I’d like to look into that.

I’d also like to tidy up some of the dynamic name assignment, potentially as a helper python library so that it can be used to tackle other solver problems.

Related Reading