Uploaded image for project: 'Swift'
  1. Swift
  2. SR-3268

Hash-Dependent Iteration Order Produces Quadratic Behaviour in Dictionary/Set



    • Type: Bug
    • Status: Resolved
    • Priority: Medium
    • Resolution: Done
    • Component/s: Standard Library
    • Labels:
    • Environment:

      Should be true in at least Swift 2 and Swift 3, and all platforms.


      Rust recently ran into a nasty bug which led to quadratic behaviour in appending the contents of one HashMap to another. Swift has the same problem for Dictionary/Set. The problem is explained excellently here: http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion

      In essence, the problem requires:

      • iteration order to be tied to where elements hash in the collection (linear scan of the array)
      • a linear-probing-based collision resolution algorithm (FCFS for Swift, Robin Hood for Rust)
      • hashes mapped to indices via modulo (masking off bits)
      • max load factor greater than 50% (75% for Swift, 90% for Rust) (? not 100% sure if <50% is adequate)
      • inserting the elements of one map into another with less capacity (as would naturally occur when inserting all the elements of a map into a freshly created one without reserving space)

      With all of these combined, the larger map has become a soft hash collision computer for the smaller map. Elements near the start of the first and second half of the larger map are guaranteed to be mapped near the start of the smaller map. Once iteration of the larger map hits the midpoint of its array, this leads to tons of collisions at the start of the smaller map's array, snowballing into linear search times for empty buckets.

      Rust's solution was to use a different seed for its hash function for every hashmap (in fact, older versions of Rust already did this, the change to a global seed was an attempted optimization). I'm not sure if this is even possible while we're using Foundation's hashcode system.

      Alternatively you could come up with some iteration scheme which is decoupled from the hash. Naively shifting the linear scan is inadequate. This is likely to hurt iteration performance under normal operation, as it uses cache worse.

      Alternatively we could move to a fancier collision resolution scheme quadratic probing?). This is likely to hurt performance on normal operation as it trashes the cache.

      Alternatively we could lower the load factor. This will hurt iteration speed and increase memory usage in common usage.

      We could also just decide that this code is "bad" and it performing poorly is acceptable. I'm not a fan of this solution, but y'know, it's there.

      Here's a sample program exhibiting the problematic behaviour.

      let size = Int(CommandLine.arguments[1])!
      var dict1 = [Int:Int]()
      for i in 0..<size {
          dict1[i] = i * 2
      var dict2 = [Int:Int]()
      for (k, v) in dict1 {
          dict2[k] = v
      print("computed value: \(dict2[size/2]!)")

      Running this on different input sizes shows clearly quadratic behaviour:

      swiftc -O dict.swift
      time ./dict 1000000
      computed value: 1000000
      real	0m0.250s
      user	0m0.197s
      sys	0m0.045s
      time ./dict 2000000
      computed value: 2000000
      real	0m0.498s
      user	0m0.400s
      sys	0m0.094s
      time ./dict 3000000
      computed value: 3000000
      real	0m9.617s
      user	0m9.498s
      sys	0m0.102s
      time ./dict 4000000
      computed value: 4000000
      real	0m1.089s
      user	0m0.880s
      sys	0m0.195s
      time ./dict 5000000
      computed value: 5000000
      real	4m45.307s
      user	4m43.069s
      sys	0m1.143s

      The dip in run time at 4,000,000 is a result of the behaviour being very load factor sensitive.




            lorentey Karoy Lorentey
            Gankro Alexis Beingessner
            2 Vote for this issue
            9 Start watching this issue