Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-11335] excessive ARC on String algorithm #53736

Open
weissi opened this issue Aug 20, 2019 · 9 comments
Open

[SR-11335] excessive ARC on String algorithm #53736

weissi opened this issue Aug 20, 2019 · 9 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself standard library Area: Standard library umbrella

Comments

@weissi
Copy link
Member

weissi commented Aug 20, 2019

Previous ID SR-11335
Radar rdar://problem/54516144
Original Reporter @weissi
Type Bug

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s Compiler, Standard Library
Labels Bug
Assignee None
Priority Medium

md5: f85968cf346327433d0a176cb463e22d

Issue Description:

From the example program https://github.com/adtrevor/ParserBuilder which seems to retain a little to much, I compiled a reduced example which uses the same sort of things from the Swift language: Strings, Substrings, index(after🙂, (Sub)String subscript with index and (Sub)String subscript with a range.

func find<S: StringProtocol>(_ string: S,
                             needle1: Character,
                             needle2: Character?,
                             needle3: Character?) -> String.Index? {
    var index = string.startIndex
    while index < string.endIndex {
        if string[index] == needle1 {
            if let needle2 = needle2 {
                if find(string[string.index(after: index)...],
                        needle1: needle2,
                        needle2: needle3,
                        needle3: nil) == string.index(after: index) {
                  return index
                }
            } else {
                return index
            }
        }
        index = string.index(after: index)
    }
    return nil
}

let howMany = 1_000_000
let end = CommandLine.arguments.dropFirst().reduce(into: "") { $0.append($1) }
print("xx[\(howMany)]xx\(end)")
let s = String(repeating: "x", count: howMany) + end
if let found = find(s, needle1: "f", needle2: "o", needle3: "o") {
    print(s[found ... s.index(after: s.index(after: found))])
} else {
    print(":( didn't find foo")
}

The algorithm is a simple algorithm to find the letters foo in a string that is 1M {{x}}s followed by a concatenation of all command line args.

If we run this like this:

$ swiftc -O test.swift && sudo dtrace -n 'pid$target::swift_retain*:entry { @c = count(); } :::END { printa(@c); } ' -c './test foo'
dtrace: description 'pid$target::swift_retain*:entry ' matched 4 probes
xx[1000000]xxfoo
foo
dtrace: pid 25468 has exited
CPU     ID                    FUNCTION:NAME
  8      2                             :END 
          3000030

we see that this program calls swift_retain over 3M 2M times (for a string that's 1M+3 characters long...) with roughly Xcode 11 beta 6.

I think that's way too many.

With swift-DEVELOPMENT-SNAPSHOT-2019-08-03-a.xctoolchain the number of retains drops to 2M 1M, quite good but not good enough 😉.

Update:

When running with

jw-swift-latest swiftc -O test.swift && sudo dtrace -n 'pid$target::swift_retain*:entry { @c[ustack()] = count(); } :::END { printa(@c); } ' -c './test foo'

we can see that 1M of those retains are from String(repeating: "x", count: howMany) which I don't care much about.

              [smallfry]

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              libswiftCore.dylib`_StringGuts.append(_:)+0x21c
              libswiftCore.dylib`specialized String.init(repeating:count:)+0x16e
              libswiftCore.dylib`String.init(repeating:count:)+0x12
              test`main+0x26a
              libdyld.dylib`start+0x1
              test`0x2
          1000000

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              test`specialized find<A>(_:needle1:needle2:needle3:)+0x133
              test`main+0x2d8
              libdyld.dylib`start+0x1
              test`0x2
          1000001

Xcode 11 beta 6ish has as expected two retains in find:

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              test`specialized find<A>(_:needle1:needle2:needle3:)+0xf8
              test`main+0x2d8
              libdyld.dylib`start+0x1
              test`0x2
          1000001

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              test`specialized find<A>(_:needle1:needle2:needle3:)+0x1e8
              test`main+0x2d8
              libdyld.dylib`start+0x1
              test`0x2
          1000001
@weissi
Copy link
Member Author

weissi commented Aug 20, 2019

CC the Michael *mans: @milseman for String & @gottesmm for ARC 🙂

@belkadan
Copy link
Contributor

Drive-by comment that this seems sensible to me at the moment: a Substring really is an independent entity that might escape and so it can't not retain its base while live. That does have unfortunate implications about passing around Substring rather than Range<String.Index>, though, and we definitely try to encourage the former.

@weissi
Copy link
Member Author

weissi commented Aug 20, 2019

@belkadan Agreed. But I'd have expected that it does it once, not once per loop iteration. Note that it will only recurse if it actually finds needle1 (which is an f) but the string is 1M {{x}}s, then whatever you pass.

Actually, I just found an even worse program (which operates only on substring):

func find<S: StringProtocol>(_ string: S,
                             needle1: Character,
                             needle2: Character?,
                             needle3: Character?) -> String.Index? {
    var index = string.startIndex
    while index < string.endIndex {
        if string[index] == needle1 {
            if let needle2 = needle2 {
                if find(string[string.index(after: index)...],
                        needle1: needle2,
                        needle2: needle3,
                        needle3: nil) == string.index(after: index) {
                  return index
                }
            } else {
                return index
            }
        }
        index = string.index(after: index)
    }
    return nil
}

let howMany = 1_000_000
let end = CommandLine.arguments.dropFirst().reduce(into: "") { $0.append($1) }
print("xx[\(howMany)]xx\(end)")
let s = String(repeating: "x", count: howMany) + end
if let found = find(s[...], needle1: "f", needle2: "o", needle3: "o") {
    print(s[found ... s.index(after: s.index(after: found))])
} else {
    print(":( didn't find foo")
}

which (with the 3rd August master snapshot) is up to 4M retains (3M of them real ones, 1M for the String(repeating:count):

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              test`specialized find<A>(_:needle1:needle2:needle3:)+0x20e
              test`main+0x309
              libdyld.dylib`start+0x1
              test`0x2
          1000000

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              libswiftCore.dylib`_StringGuts.append(_:)+0x21c
              libswiftCore.dylib`specialized String.init(repeating:count:)+0x16e
              libswiftCore.dylib`String.init(repeating:count:)+0x12
              test`main+0x26a
              libdyld.dylib`start+0x1
              test`0x2
          1000000

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              test`specialized find<A>(_:needle1:needle2:needle3:)+0x98
              test`main+0x309
              libdyld.dylib`start+0x1
              test`0x2
          1000001

              libswiftCore.dylib`swift_retain
              libswiftCore.dylib`swift_bridgeObjectRetain+0x46
              libswiftCore.dylib`specialized Slice.subscript.getter+0x4ef
              test`specialized find<A>(_:needle1:needle2:needle3:)+0xb2
              test`main+0x309
              libdyld.dylib`start+0x1
              test`0x2
          1000001

@weissi
Copy link
Member Author

weissi commented Aug 20, 2019

@swift-ci create

@gottesmm
Copy link
Member

@weissi can you submit this as a benchmark? It seems useful for us to track and seems like it would fit easily.

@weissi
Copy link
Member Author

weissi commented Aug 20, 2019

@gottesmm sure, working on it now.

@weissi
Copy link
Member Author

weissi commented Aug 20, 2019

@gottesmm #26744

@belkadan
Copy link
Contributor

Hm, I guess so. Even though find is recursive, taking the argument at +0 means that the base string ought to be able to be "rescued" at +1 out of the Substring once it's done with it, and then that extra +1 can be hoisted out of the loop.

@swift-ci
Copy link
Collaborator

Comment by Pop Flamingo (JIRA)

Thanks to everyone for taking a look at this. In case somebody ever needs to check how this impacts the performance of the project @weissi linked, the master branch will probably have changed quite a lot in other to get rid of those performance bottlenecks the commit that will still expose the issue is this one : PopFlamingo/ParserBuilder@4ced9d2

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

4 participants