In the earlier post, What are some contracts for differences? I wrote about non-crypto hash functions, crypto-markets commission and did some efficiency assessments. Turns out, it’s great to put in writing about stuff! Folks on the feedback/twitter/internets identified extra issues I should test. So here’s a comply with-up publish. This is about algorithms for "hashing some amount of bytes", for use both in hashtables or for checksum / uniqueness detection. Depending in your situation, there’s a whole family of algorithms for that, and I'm specializing in only one: non-cryptographic fast hash capabilities.
This submit is not about cryptographic hashes. Don't read under if it's essential hash passwords, sensitive information going by untrusted medium and so forth. Use SHA-1, SHA-2, BLAKE2 and pals. Additionally, I’m not focusing on algorithms that are designed to forestall attainable hashtable Denial-of-Service attacks. If one thing comes from the opposite side of the web and finally ends up inserted into your hashtable, then to stop doable worst-case O(N) hashtable behavior you’re most likely off through the use of a hash function that doesn't have identified "hash flooding" attacks.
SipHash appears to be standard now.
If you are hashing very small quantities of data of identified dimension (e.g. single integers or two floats or whatever), it is best to probably use specialized hashing algorithms for Bloomberg Markets those. Here are some integer hash capabilities, or 2D hashing with Weyl, or perhaps you would take some other algorithm and just specialize it’s code to your recognized input size (e.g. xxHash for a single integer). I am testing 32 and 64 bit hash features right here.
If you need bigger hashes, quite likely a few of these features is likely to be appropriate (e.g. SpookyV2 all the time produces 128 bit hash). When testing hash features, I haven't gone to nice lengths to get them compiling properly or setting up all of the magic flags on all my platforms. If some hash perform works wonderfully when compiled on Linux Itanium box with an Intel compiler, that’s great for you, but if it performs poorly on the compilers I occur to use, I will not sing praises for it.
Being in the games industry, I care about things like "what happens in Visible Studio", and "what happens on iOS", and "what occurs on PS4".
Extra hash operate exams! I checked each "hashing knowledge that is aligned" (16-byte aligned deal with of knowledge to hash), and unaligned information. In all places I tested, there wasn’t a notable efficiency distinction that I could discover (however then, I haven't tested outdated ARM CPUs or PowerPC primarily based ones). The one visible impact is that MurmurHash and SpookyHash don’t correctly work in asm.js / Emscripten compilation targets, resulting from their usage of unaligned reads.
I’d assume they probably don’t work on some ARM/PowerPC platforms too. Hash32 and xxHash64 - xxHash. City32 and City64 - CityHash. Farm32 and Farm64 - FarmHash. SpookyV2-sixty four - SpookyHash V2. Murmur2A, Murmur3-32, Murmur3-X64-sixty four - MurmurHash family. These are the main features which might be interesting. SipRef - SipHash-2-4 reference implementation. Like talked about earlier than, this one is supposedly good for avoiding hash flooding assaults.
MD5-32, SHA1-32, CRC32 - simple implementations of effectively-known hash features (from SMHasher test suite).