Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-8905] Gaps in String benchmarking #51411

Open
milseman mannequin opened this issue Oct 3, 2018 · 10 comments
Open

[SR-8905] Gaps in String benchmarking #51411

milseman mannequin opened this issue Oct 3, 2018 · 10 comments
Labels
benchmarks good first issue Good for newcomers standard library Area: Standard library umbrella task

Comments

@milseman
Copy link
Mannequin

milseman mannequin commented Oct 3, 2018

Previous ID SR-8905
Radar None
Original Reporter @milseman
Type Task
Additional Detail from JIRA
Votes 1
Component/s Standard Library
Labels Task, Benchmark, StarterBug
Assignee None
Priority Medium

md5: 310808e3ff76ad0c9651e7a3550b6d27

Sub-Tasks:

  • SR-9226 Breadcrumb benchmarks
  • SR-10855 Benchmark non-ASCII with AngryPhonebook

Issue Description:

While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.

  • Some String RangeReplaceableCollection operations. We're missing benchmarking for:

    • insert<C: Collection>(_: C)

      • Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
    • See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.

  • Grapheme breaking

    • We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.

    • Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration

    • We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.

    • We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.

    • See benchmark/single-source/StringWalk.swift.gyb

  • Substring-based comparison/hashing and benchmark unification

    • We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.

    • We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.

    • Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks

    • See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift

  • Transcoding chunks of data from one encoding to another

    • Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.

    • UTF8 -> UTF16 is likely to be an increasingly important one for the future.

    • See benchmark/single-source/UTF8Decode.swift for some inspiration

  • Case conversion: String.lowercased() and String.uppercased()

    • ASCII and non-ASCII bridged NSStrings

    • See benchmarks/single-source/AngryPhonebook.swift for some guidance

Completed

Wait for approval:

@palimondo
Copy link
Mannequin

palimondo mannequin commented Oct 3, 2018

When implementing these benchmarks please keep in mind that Swift Benchmark Suite is mostly a set of micro benchmarks, so aim for base workloads that can execute in under 10ms, ideally under 1ms (or 1000 microseconds — the unit reported from Benchmark_O). Longer runtimes accumulate measurement errors due to context switching between processes by OS scheduler.

@milseman
Copy link
Mannequin Author

milseman mannequin commented Oct 3, 2018

Good point! I updated each major bullet point with an existing benchmark to refer to as inspiration

@swift-ci
Copy link
Collaborator

swift-ci commented Oct 4, 2018

Comment by Patrick Balestra (JIRA)

This seems like a good starter issue for a newcomer in the standard library. I just created SR-8908 to try and fill the `insert(_:Character)` gap with some benchmarks.

@milseman
Copy link
Mannequin Author

milseman mannequin commented Oct 6, 2018

@palimondo to clarify, is the goal for a run time of 1-10ms when passed `1` as an argument, or some other value?

@palimondo
Copy link
Mannequin

palimondo mannequin commented Oct 13, 2018

@milseman If you’re asking about command line argument num-iters for Benchmark_O, that gets passed as iteration multiplier N to the benchmark testFunction, it doesn’t affect the reported values directly. Driver divides the measured value by N, reporting an average. Keeping runtimes reasonably short, ideally under 1ms, lowers the chance of external factors, like system load, to introduce measurement errors.

@swift-ci
Copy link
Collaborator

swift-ci commented Jun 7, 2019

Comment by Armen Mkrtchian (JIRA)

Hey, I will work on benchmarking non-ASCII on AngryPhonebook

@milseman
Copy link
Mannequin Author

milseman mannequin commented Jun 10, 2019

armenm (JIRA User) could you link the PR?

keitaito (JIRA User) opened #25310 for replaceSubrange calls.

@swift-ci
Copy link
Collaborator

Comment by Armen Mkrtchian (JIRA)

Hi @milseman
Here is the link to PR.
#25309
I have also created a subtask for this ticket.
https://bugs.swift.org/browse/SR-10855

@swift-ci
Copy link
Collaborator

swift-ci commented Aug 4, 2019

Comment by Keita Ito (JIRA)

Benchmarks for `String.replaceSubrange(_:with:)` are added by this merged PR #25310 Thank you Pavol and Michael for your patient review and support!

@swift-ci
Copy link
Collaborator

Comment by Valeriy Van (JIRA)

Benchmarks for UTF16 decoding were added by PR #34435.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks good first issue Good for newcomers standard library Area: Standard library umbrella task
Projects
None yet
Development

No branches or pull requests

1 participant