Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-13063] String(decoding:as:) is 10x slower for ASCII than for UTF8 #55509

Open
karwa opened this issue Jun 23, 2020 · 0 comments
Open

[SR-13063] String(decoding:as:) is 10x slower for ASCII than for UTF8 #55509

karwa opened this issue Jun 23, 2020 · 0 comments
Labels
improvement performance standard library Area: Standard library umbrella

Comments

@karwa
Copy link
Contributor

karwa commented Jun 23, 2020

Previous ID SR-13063
Radar None
Original Reporter @karwa
Type Improvement

Attachment: Download

Environment

Xcode Version 11.5 (11E608c), macOS 10.15.2

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Improvement, Performance
Assignee None
Priority Medium

md5: 69390e8b97a32575471cd6fb5ab50fa4

Issue Description:

Take the following code, which is part of a parser for URL schemes:

struct ASCII {
  var codePoint: UInt8
  init(_unchecked: UInt8) { self.codePoint = _unchecked }
  
  static var h: ASCII { ASCII(_unchecked: 0x68) }
  // ...
}

enum Scheme {
        case ftp
        case file
        case http
        case https
        case ws
        case wss
        case other(String)

 static func parse<C>(asciiBytes: C) -> Scheme where C: Collection, C.Element == UInt8 {
            func notRecognised() -> Scheme {
                return .other(String(decoding: asciiBytes, as: Unicode.ASCII.self))
            }

            var iter = asciiBytes.lazy.map { ASCII(_unchecked: $0) }.makeIterator()
            switch iter.next() {
            case .h?:
                guard iter.next() == .t, iter.next() == .t, iter.next() == .p else { return notRecognised() }
                switch iter.next() {
                case .s?:
                    guard iter.next() == nil else { return notRecognised() }
                    return .https
                case .none:
                    return .http
                case .some(_):
                    return notRecognised()
                }
            case .f?:
                switch iter.next() {
                case .i?:
                    guard iter.next() == .l, iter.next() == .e, iter.next() == nil else { return notRecognised() }
                    return .file
                case .t?:
                    guard iter.next() == .p, iter.next() == nil else { return notRecognised() }
                    return .ftp
                default:
                    return notRecognised()
                }

<<<<< snip. Et cetera. Full code is here: https://github.com/karwa/base/blob/eb914b42cf26adaf63b5b65368c53eef07c2903c/Sources/URL/URL.swift#L42 >>>>>

            }
        }
}

While benchmarking, I found that this was taking an incredible amount of time - up to 15% of parsing a URL, with almost all of that coming from String creation:

With one modification: changing decoding to `UTF8.self`, we can get literally an order of magnitude better performance:

I understand that UTF8 is String's native encoding and has certain fast-paths, but ASCII is a restricted subset of UTF8 and should be even easier to validate and repair (if needed). I would certainly not expect it to be 10x slower! That's just too much.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement performance standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

1 participant