Rust recently ran into a nasty bug which led to quadratic behaviour in appending the contents of one HashMap to another. Swift has the same problem for Dictionary/Set. The problem is explained excellently here: http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion
In essence, the problem requires:
- iteration order to be tied to where elements hash in the collection (linear scan of the array)
- a linear-probing-based collision resolution algorithm (FCFS for Swift, Robin Hood for Rust)
- hashes mapped to indices via modulo (masking off bits)
- max load factor greater than 50% (75% for Swift, 90% for Rust) (? not 100% sure if <50% is adequate)
- inserting the elements of one map into another with less capacity (as would naturally occur when inserting all the elements of a map into a freshly created one without reserving space)
With all of these combined, the larger map has become a soft hash collision computer for the smaller map. Elements near the start of the first and second half of the larger map are guaranteed to be mapped near the start of the smaller map. Once iteration of the larger map hits the midpoint of its array, this leads to tons of collisions at the start of the smaller map's array, snowballing into linear search times for empty buckets.
Rust's solution was to use a different seed for its hash function for every hashmap (in fact, older versions of Rust already did this, the change to a global seed was an attempted optimization). I'm not sure if this is even possible while we're using Foundation's hashcode system.
Alternatively you could come up with some iteration scheme which is decoupled from the hash. Naively shifting the linear scan is inadequate. This is likely to hurt iteration performance under normal operation, as it uses cache worse.
Alternatively we could move to a fancier collision resolution scheme quadratic probing?). This is likely to hurt performance on normal operation as it trashes the cache.
Alternatively we could lower the load factor. This will hurt iteration speed and increase memory usage in common usage.
We could also just decide that this code is "bad" and it performing poorly is acceptable. I'm not a fan of this solution, but y'know, it's there.
Here's a sample program exhibiting the problematic behaviour.
Running this on different input sizes shows clearly quadratic behaviour:
The dip in run time at 4,000,000 is a result of the behaviour being very load factor sensitive.