Uploaded image for project: 'Swift'
  1. Swift
  2. SR-68

SequenceType._preprocessingPass can cause duplicate evaluation of lazy collections

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Component/s: Standard Library
    • Labels:
      None
    • Environment:

      Swift version 2.2-dev (LLVM 46be9ff861, Clang 4deb154edc, Swift 3dbfefab2f)

      Description

      Code that uses SequenceType._preprocessingPass will have the closure executed when the collection is a lazy collection. This is a problem because the closure almost certainly is going to access some property that forces lazy evaluation (for maps, that would be iterating over the elements; for filters, even just accessing the count will evaluate the filter). And this is a bad idea because the overhead of extra evaluations of the lazy computations is probably worse than the performance gain from doing the preprocessing pass.

      I've reproduced this with <SequenceType where Generator.Element == String>.joinWithSeparator(String) on a lazy map. This should apply to other situations as well. The test case for joinWithSeparator looks like

      JoinTestSuite.test("StringCollection.joinWithSeparator()") {
        // make sure we only evaluate each element once
        func testLazyJoin(separator: String, _ elements: [String]) {
          var expectSeen: [Int: Int] = [:]
          for i in 0..<elements.count {
            expectSeen[i] = 1
          }
          var seen: [Int: Int] = [:]
          // elements needs to be a collection, not a sequence
          let elements = Array(elements.enumerate()).lazy.map({ (i, x) -> String in
            seen[i] = (seen[i] ?? 0) + 1
            return x
          })
          let _: String = elements.joinWithSeparator(separator)
          expectEqual(expectSeen, seen)
        }
      
        testLazyJoin("", ["a"])
        testLazyJoin("", ["a", "b", "c"])
        testLazyJoin("xy", ["ab", "cd", "ef"])
      }
      

      I'm unsure what the correct behavior here is. If we make _preprocessingPass skip the pass for lazy collections, that throws away some potential valid optimizations, such as relying on an exact count for lazy maps, or optimizing on lazy reverses. But the alternative is to define several classifications of "safe" preprocessing and have separate preprocessing methods for each classification, so e.g. a lazy filter will never preprocess, a lazy map can preprocess for computations involving just the collection indexes, and lazy reverses can do everything. If we do go down this route, we'll have to make sure the lazy collections actually query their underlying base collection too if they think they can handle the pass, because they might be wrapping some other lazy collection that can't.

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            lily Lily Ballard
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: