Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-12849] Inefficient/broken .prefix() implementation on .lazy collection views #55295

Open
ktoso opened this issue May 21, 2020 · 2 comments
Open
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Collection Area → standard library: The `Collection` protocol run-time performance standard library Area: Standard library umbrella swift 6.0 unexpected behavior Bug: Unexpected behavior or incorrect output

Comments

@ktoso
Copy link
Member

ktoso commented May 21, 2020

Previous ID SR-12849
Radar rdar://problem/63487263
Original Reporter @ktoso
Type Bug
Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels Bug
Assignee None
Priority Medium

md5: 6ecff5e3f62cd93e9d95eb3621436f7f

Issue Description:

There's two issues here but they are very related to each other.
Consider the following snippet (will be simplified more below):

var alreadySeen: Set<Int> = [3, 2, 1]

func take(_ ss: [Int]) -> [Int] {
    let x = ss.lazy.filter { s in
        print("already seen: \(alreadySeen)")
        print("s = \(s)")
        let ins = alreadySeen.insert(s).inserted
        print("ins = \(ins)")
        return ins
    }.prefix(1)

    return Array(x)
}
take([222,333])

This program results in a crash (and other not nice effects, explained below):

already seen: [3, 2, 1]
s = 1221
ins = true
already seen: [1, 3, 222, 2]
s = 23
ins = true
already seen: [1, 3, 222, 2, 333]
s = 1221
ins = false
already seen: [1, 3, 222, 2, 333]
s = 23
ins = false
Fatal error: Can't form Range with upperBound < lowerBound: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang/swiftlang-1103.8.25.8/swift/stdlib/public/core/Range.swift, line 709
{{ Execution interrupted. Enter code to recover and continue.

Problems with this output:

  • The crash. Sure, one may argue that the collection view should side effect in the filter, but the crash is pretty nasty and a result of:

  • The collection gets iterated twice

  • Last but not least: the lazy...prefix combination does not stop the iteration when n elements are found, rather, ALL elements are traversed each time.

More mini examples:

func test(_ arr: [Int]) -> [Int] {
     var invocationCount = 0
     let x: LazyFilterCollection<Array<Int>> = arr.lazy.filter { s in
         invocationCount += 1
         return s == 1 // "search for 1"
     }
     defer {
         print("invoked \(invocationCount) times")
     }
     return Array(x.prefix(1)) // we only want 1 element
}
> test(Array(1...10000))
invoked 20001 times
$R6: [Int] = 1 value {
  [0] = 1
}

That is a lot of extra iterations for finding the first element in this array.

Expected:

  • when prefix(1) is used, there should not be a need to iterate over the rest of the collection once n elements are collected

  • there should not be a need to traverse the collection twice.

We did some digging with @Lukasa, and the troublesome places for this behavior are around:

  • LazyFilterSequence not specializing prefix()

  • and thus defaulting to use the Collection's impl which does:

    @inlinable
    public _consuming func prefix( maxLength: Int) -> SubSequence {
     _precondition( maxLength >= 0, "Can't take a prefix of negative length from a collection") 
     let end = index(startIndex, offsetBy: maxLength, limitedBy: endIndex) ?? endIndex 
     return self[startIndex..<end] 
    }

which calls startIndex multiple times (we suspect that's the "multiple times" traverse), since the startIndex is implemented as:

public var startIndex: Index {
  var index = _base.startIndex
  while index != _base.endIndex && !_predicate(_base[index]) { 
    _base.formIndex(after: &index)
  }
  return index
}

on the lazy filter view.

And that explains the crasher as well – since the contents of the view changed depending on how many times it's run (it collects the "seen" elements), the indexes don't align anymore and we get the Range crash.

Fixing the lazy view's prefix to correctly iterate only once should fix this behavior.

Also, it should stop iteration once enough elements are collected.


For reference, a similar feature in Scala exhibits the correct behavior:

scala> var i = 0; val v = (1 until 10000).view.filter { x =>
     |   i +=1; x == 1
     | }.take(1).toList
var i: Int = 1
val v: List[Int] = List(1)
@ktoso
Copy link
Member Author

ktoso commented May 21, 2020

@swift-ci create

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@oscbyspro
Copy link
Contributor

oscbyspro commented Feb 16, 2024

I had a look at the "more mini examples" part, and it does indeed call this Collection method:

@inlinable
public __consuming func prefix(_ maxLength: Int) -> SubSequence {
_precondition(
maxLength >= 0,
"Can't take a prefix of negative length from a collection")
let end = index(startIndex,
offsetBy: maxLength, limitedBy: endIndex) ?? endIndex
return self[startIndex..<end]
}

As a result, Array(Array(1...10000).lazy.filter({ $0 == 1 }).prefix(1)) invokes its closure 20001 times:

  1. 1x to find the startIndex
  2. 9999x to find the next index end (it does not exist, so endIndex)
  3. 1x to find the startIndex
  4. 10000x to filter each element in self[startIndex..<end] in Array.init(_:)

The problem in this case is that .prefix(1) returns a SubSequence. Luckily, there's a different overload:

var a = 0
var b = 0

Array(Array(1...10000).lazy.filter({ a += 1; return $0 == 1 }).prefix(1) as LazyFilterSequence<[Int]>.SubSequence)
Array(Array(1...10000).lazy.filter({ b += 1; return $0 == 1 }).prefix(1) as PrefixSequence<LazyFilterSequence<[Int]>>)

print(a) // 20001
print(b) // 1

I thought I'd solve this issue with some collection algorithm, but it appears to be about overload resolution.

@AnthonyLatsis AnthonyLatsis added run-time performance swift 6.0 Collection Area → standard library: The `Collection` protocol unexpected behavior Bug: Unexpected behavior or incorrect output labels Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Collection Area → standard library: The `Collection` protocol run-time performance standard library Area: Standard library umbrella swift 6.0 unexpected behavior Bug: Unexpected behavior or incorrect output
Projects
None yet
Development

No branches or pull requests

3 participants