Uploaded image for project: 'Swift'
  1. Swift
  2. SR-8905

Gaps in String benchmarking




      While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.


      • Some String RangeReplaceableCollection operations. We're missing benchmarking for:
        • insert<C: Collection>(_: C)
          • Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
        • See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.
      • Grapheme breaking
        • We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.
        • Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration
        • We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.
        • We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.
        • See benchmark/single-source/StringWalk.swift.gyb 
      • Substring-based comparison/hashing and benchmark unification
        • We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.
        • We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.
        • Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks
        • See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift
      • Transcoding chunks of data from one encoding to another
        • Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.
        • UTF8 -> UTF16 is likely to be an increasingly important one for the future.
        • See benchmark/single-source/UTF8Decode.swift for some inspiration
      • Case conversion: String.lowercased() and String.uppercased()
        • ASCII and non-ASCII bridged NSStrings
        • See benchmarks/single-source/AngryPhonebook.swift for some guidance






            milseman Michael Ilseman
            1 Vote for this issue
            12 Start watching this issue