Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-15173] FlattenSequence.count has awful, awful performance #57496

Open
karwa opened this issue Sep 8, 2021 · 1 comment
Open

[SR-15173] FlattenSequence.count has awful, awful performance #57496

karwa opened this issue Sep 8, 2021 · 1 comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella

Comments

@karwa
Copy link
Contributor

karwa commented Sep 8, 2021

Previous ID SR-15173
Radar rdar://problem/82934328
Original Reporter @karwa
Type Bug

Attachment: Download

Environment

Apple Swift version 5.5-dev (LLVM bcf6482631963a8, Swift 37aa203)

This is just a random nightly I happen to have installed. As of reporting, Godbolt is running:

Swift version 5.6-dev (LLVM ca88d53176e346a, Swift dc60996)

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug
Assignee None
Priority Medium

md5: 09ab6e8a9dcadc52c76efb301ff669b5

Issue Description:

FlattenSequence.count has really, really bad performance.

Check out the following code:

func test(_ input: Array<Array<Int>>) -> Int {
    input.joined().count
}

Trying this in Godbolt (nightly, -O), it generates an enormous amount of code. 349 lines, full of generic specializations of this and that, retains, releases, __swift_instantiateConcreteTypeFromMangledName, and, worst of all, this thing:

generic specialization <[Swift.Int]> of protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swift:
        push    r15
        push    r14
        push    r12
        push    rbx
        push    rax
        mov     r14, rdx
        mov     r15, rsi
        mov     r12, rdi
        mov     edi, 40
        call    malloc@PLT
        mov     rbx, rax
        mov     qword ptr [r12], rax
        mov     rdi, rax
        mov     rsi, r15
        mov     rdx, r14
        call    (generic specialization <[Swift.Int]> of Swift.Array.subscript.read : (Swift.Int) -> A)
        mov     qword ptr [rbx + 32], rax
        lea     rax, [rip + (protocol witness for Swift.Collection.subscript.read : (A.Index) -> A.Element in conformance [A] : Swift.Collection in Swiftgeneric specialization <[Swift.Int]> of  with unmangled suffix ".resume.0")]
        add     rsp, 8
        pop     rbx
        pop     r12
        pop     r14
        pop     r15
        ret

Yes, you read that correctly. This is an Array 'read' accessor - the thing that will access every element, and it has a big, fat malloc in the middle.

It's really bad. Here you can see it (and the resulting frees) absolutely dominate my actual collection code.

My benchmarks got 500% slower when using it. I couldn't believe it.

Anyway, what's really interesting is what happens when you try the following instead:

func test2(_ input: Array<Array<Int>>) -> Int {
    var count = 0
    for _ in input.joined() { count += 1}
    return count
}

Boom. 59 lines, all in a single function. None of these generic specializations floating around, no __swift_instantiateConcreteTypeFromMangledName, and most importantly: no mallocs (okay, there's a couple of retain/releases. I wonder if they're really necessary, but I can live with them). This code is hundreds of times faster, not to mention much, much smaller. Again, compiling at -O.

Anyway, I looked at the implementation of FlattenSequence, and nothing really jumps out at me. It seems to do a full index traversal in its `distance(from:to🙂`, which seems a bit unnecessary, and it doesn't appear to make use of the wrapped collection's version of `distance(from:to🙂` AFAICT - so, room for improvement, but nothing that would cause the behaviour I'm seeing.

I get a feeling this is going to be a tricky one. Bring a shovel, because there's some serious digging to be done.

Here's the Godbolt link with both versions of the function. Un-comment the one you'd like to see.

@typesanitizer
Copy link

@swift-ci create

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
oscbyspro added a commit to oscbyspro/swift that referenced this issue Feb 15, 2024
…e#57496), which points out the poor performance of `FlattenSequence/count`. The current `count` implementation calls `distance(from:to:)`, which iterates through the entire collection, so I thought I'd improve that method. The new version computes the distance as the sum of three parts, with a fast path when both indices belong to the same underlying collection. Note that it does this by calling either `count` or `distance(from:to:)` on each underlying collection, which makes `[repeatElement(0, count: Int.max)].joined().count` instant, for example.
oscbyspro added a commit to oscbyspro/swift that referenced this issue Feb 15, 2024
I was looking through Swift's issues tab and found this issue (apple#57496), which points out the poor performance of `FlattenSequence/count`. The current `count` implementation calls `distance(from:to:)`, which iterates through the entire collection, so I thought I'd improve that method. The new version computes the distance as the sum of three parts, with a fast path when both indices belong to the same underlying collection. Note that it does this by calling either `count` or `distance(from:to:)` on each underlying collection, which makes `[repeatElement(0, count: Int.max)].joined().count` instant, for example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

2 participants