Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-13111] String allocates twice when constructing from non-contiguous collection #55557

Open
Lukasa opened this issue Jun 29, 2020 · 3 comments · May be fixed by #39201
Open

[SR-13111] String allocates twice when constructing from non-contiguous collection #55557

Lukasa opened this issue Jun 29, 2020 · 3 comments · May be fixed by #39201
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. performance standard library Area: Standard library umbrella

Comments

@Lukasa
Copy link
Contributor

Lukasa commented Jun 29, 2020

Previous ID SR-13111
Radar rdar://problem/64947782
Original Reporter @Lukasa
Type Bug
Status In Progress
Resolution
Environment

Swift 5.3

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug, Performance
Assignee hassaneldesouky (JIRA)
Priority Medium

md5: dfd669c7f10cade2c786ee7ff78dbb26

Issue Description:

Consider the Swift program below:

@inline(never)
func test<Bytes: RandomAccessCollection>(_ string: Bytes) -> Int where Bytes.Element == UInt8 {
    let baseString = String(decoding: string, as: UTF8.self)
    return baseString.utf8.count
}


func main() -> Int {
    let coll = UInt8(1)..<UInt8(100)
    var acc = 0
    for _ in 0..<100 {
        acc += test(coll)
    }
    return acc
}

if main() == 10101010 {
    fatalError("Bang")
}

This code constructs 100 Strings from ranges. It does a few extra things just to prevent the compiler from optimising that work out.

I'd expect this program to allocate 100 times on String construction (as none of these strings are small). However, in practice, it allocates 200 times:

              libsystem_malloc.dylib`malloc_zone_malloc
              libswiftCore.dylib`swift_slowAlloc+0x28
              libswiftCore.dylib`swift_allocObject+0x27
              test`specialized _ContiguousArrayBuffer.init(_uninitializedCount:minimumCapacity:)+0x39
              test`specialized Array.init<A>(_:)+0x35
              test`specialized static String._fromNonContiguousUnsafeBitcastUTF8Repairing<A>(_:)+0x10
              test`specialized test<A>(_:)+0x114
              test`main+0x1a
              libdyld.dylib`start+0x1
              test`0x1
              100

              libsystem_malloc.dylib`malloc_zone_malloc
              libswiftCore.dylib`swift_slowAlloc+0x28
              libswiftCore.dylib`swift_allocObject+0x27
              libswiftCore.dylib`_allocateStringStorage(codeUnitCapacity:)+0xc6
              libswiftCore.dylib`specialized static __StringStorage.create(initializingFrom:codeUnitCapacity:isASCII:)+0x47
              libswiftCore.dylib`specialized static String._uncheckedFromUTF8(_:asciiPreScanResult:)+0x26
              libswiftCore.dylib`specialized static String._fromUTF8Repairing(_:)+0x370
              test`specialized static String._fromNonContiguousUnsafeBitcastUTF8Repairing<A>(_:)+0x20
              test`specialized test<A>(_:)+0x114
              test`main+0x1a
              libdyld.dylib`start+0x1
              test`0x1
              100

This is because of the implementation of String(decoding:as🙂, which ultimately calls String.fromNonContiguousUnsafeBitcastUTF8Repairing<C>(🙂. That function looks like this:

  @_alwaysEmitIntoClient
  @inline(never) // slow-path
  private static func _fromNonContiguousUnsafeBitcastUTF8Repairing<
    C: Collection
  >(_ input: C) -> (result: String, repairsMade: Bool) {
    _internalInvariant(C.Element.self == UInt8.self)
    return Array(input).withUnsafeBufferPointer {
      let raw = UnsafeRawBufferPointer($0)
      return String._fromUTF8Repairing(raw.bindMemory(to: UInt8.self))
    }
  }

Note that this constructs an intermediary Array, before dropping into String._fromUTF8Repairing.

Given that this call is specialised for Collection and not Sequence, there seems to be no good reason that this call path cannot be reimplemented on top of String.init(unsafeUninitializedCapacity:initializingWith🙂 and then simply copying the bytes over that way. This should be a net performance win: we'll halve the allocations of this code path, and we'll also remove an intermediary memcpy that we otherwise have to do.

@Lukasa
Copy link
Contributor Author

Lukasa commented Jun 29, 2020

@swift-ci create

@swift-ci
Copy link
Collaborator

swift-ci commented Dec 2, 2021

Comment by Hassan ElDesouky (JIRA)

I opened a PR for this a while ago. Can someone take a look whenever you're free? #40225

cc @Lukasa @glessard

@glessard
Copy link
Contributor

glessard commented Dec 5, 2021

An in-progress PR by @karwa to address this:
#39201

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. performance standard library Area: Standard library umbrella
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants