[SR-8905] Gaps in String benchmarking #51411

milseman · 2018-10-03T20:00:27Z


Previous ID	SR-8905
Radar	None
Original Reporter	@milseman
Type	Task

Additional Detail from JIRA


Votes	1
Component/s	Standard Library
Labels	Task, Benchmark, StarterBug
Assignee	None
Priority	Medium

md5: 310808e3ff76ad0c9651e7a3550b6d27

Sub-Tasks:

SR-9226 Breadcrumb benchmarks
SR-10855 Benchmark non-ASCII with AngryPhonebook

Issue Description:

While inspecting our benchmarking story, there are many micro-benchmark gaps that we should fill. This bug holds a listing of such gaps, and is a start task. Anyone interested in covering a gap should create a new bug for it, assign to themselves, and apply the fix. I can review the PR or provide further guidance.

Some String RangeReplaceableCollection operations. We're missing benchmarking for:
- insert<C: Collection>(_: C)
  - Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc
- See benchmark/single-source/RemoveWhere.swift for an example of some RRC operations.
Grapheme breaking
- We have many benchmarks present, but they're disabled (for bad historical reasons). We should re-enable them.
- Also, we have unicode-scalar breaking variants that are enabled, however this is a highly redundant suite as unicode-scalar-breaking is far more uniform. The list of workloads for unicode-scalar breaking should be pruned to: ascii, russian, chinese, and emoji, and renamed to not imply Character iteration
- We don't benchmark grapheme-breaking on bridged NSStrings strings. Historically this hasn't exhibited much perf difference, but it's a bind spot currently.
- We also count via iteration, but in theory, String.count could be made faster than raw iteration. We should pick one workload to just run String.count on.
- See benchmark/single-source/StringWalk.swift.gyb
Substring-based comparison/hashing and benchmark unification
- We have some for Substring without a very diverse payload in Substring.swift, and diverse payloads only for String in StringComparison.swift.
- We should merge these two benchmarks together, getting the diversity of StringComparison.swift and the same-buffer-but-different-pointer variants for Substring.swift, applied to Substrings as well as Strings.
- Similarly, for Strings and Substrings from bridged NSStrings, though we might prune the datasets to reduce a combinatorial explosion in number of benchmarks
- See benchmark/single-source/Substring.swift and benchmark/single-source/StringComparison.swift
Transcoding chunks of data from one encoding to another
- Each encoding in Unicode.Encoding has a transcode<>() method, which we can benchmark.
- UTF8 -> UTF16 is likely to be an increasingly important one for the future.
- See benchmark/single-source/UTF8Decode.swift for some inspiration
Case conversion: String.lowercased() and String.uppercased()
- ASCII and non-ASCII bridged NSStrings
- See benchmarks/single-source/AngryPhonebook.swift for some guidance

Completed

String Breadcrumbing - https://bugs.swift.org/browse/SR-9226
insert(_:Character) - https://bugs.swift.org/browse/SR-8908
replaceSubrange<C: Collection>(_:C) - [benchmark] Add ReplaceSubrange benchmark #25310
case conversion ASCII and non-ASCII - https://bugs.swift.org/browse/SR-10855

Wait for approval:

UTF16 decoding - Add benchmarks for UTF16 decoding #34435

palimondo · 2018-10-03T20:40:59Z

When implementing these benchmarks please keep in mind that Swift Benchmark Suite is mostly a set of micro benchmarks, so aim for base workloads that can execute in under 10ms, ideally under 1ms (or 1000 microseconds — the unit reported from Benchmark_O). Longer runtimes accumulate measurement errors due to context switching between processes by OS scheduler.

milseman · 2018-10-03T20:51:04Z

Good point! I updated each major bullet point with an existing benchmark to refer to as inspiration

swift-ci · 2018-10-04T07:19:42Z

Comment by Patrick Balestra (JIRA)

This seems like a good starter issue for a newcomer in the standard library. I just created SR-8908 to try and fill the `insert(_:Character)` gap with some benchmarks.

milseman · 2018-10-06T01:38:16Z

@palimondo to clarify, is the goal for a run time of 1-10ms when passed `1` as an argument, or some other value?

palimondo · 2018-10-13T04:47:23Z

@milseman If you’re asking about command line argument num-iters for Benchmark_O, that gets passed as iteration multiplier N to the benchmark testFunction, it doesn’t affect the reported values directly. Driver divides the measured value by N, reporting an average. Keeping runtimes reasonably short, ideally under 1ms, lowers the chance of external factors, like system load, to introduce measurement errors.

swift-ci · 2019-06-07T22:05:37Z

Comment by Armen Mkrtchian (JIRA)

Hey, I will work on benchmarking non-ASCII on AngryPhonebook

milseman · 2019-06-10T23:12:43Z

armenm (JIRA User) could you link the PR?

keitaito (JIRA User) opened #25310 for replaceSubrange calls.

swift-ci · 2019-06-11T00:48:12Z

Comment by Armen Mkrtchian (JIRA)

Hi @milseman
Here is the link to PR.
#25309
I have also created a subtask for this ticket.
https://bugs.swift.org/browse/SR-10855

swift-ci · 2019-08-04T21:33:24Z

Comment by Keita Ito (JIRA)

Benchmarks for `String.replaceSubrange(_:with:)` are added by this merged PR #25310 Thank you Pavol and Michael for your patient review and support!

swift-ci · 2020-10-26T18:35:44Z

Comment by Valeriy Van (JIRA)

Benchmarks for UTF16 decoding were added by PR #34435.

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-8905] Gaps in String benchmarking #51411

[SR-8905] Gaps in String benchmarking #51411

milseman mannequin commented Oct 3, 2018

palimondo mannequin commented Oct 3, 2018

milseman mannequin commented Oct 3, 2018

swift-ci commented Oct 4, 2018

milseman mannequin commented Oct 6, 2018

palimondo mannequin commented Oct 13, 2018

swift-ci commented Jun 7, 2019

milseman mannequin commented Jun 10, 2019

swift-ci commented Jun 11, 2019

swift-ci commented Aug 4, 2019

swift-ci commented Oct 26, 2020

[SR-8905] Gaps in String benchmarking #51411

[SR-8905] Gaps in String benchmarking #51411

Comments

milseman mannequin commented Oct 3, 2018

palimondo mannequin commented Oct 3, 2018

milseman mannequin commented Oct 3, 2018

swift-ci commented Oct 4, 2018

milseman mannequin commented Oct 6, 2018

palimondo mannequin commented Oct 13, 2018

swift-ci commented Jun 7, 2019

milseman mannequin commented Jun 10, 2019

swift-ci commented Jun 11, 2019

swift-ci commented Aug 4, 2019

swift-ci commented Oct 26, 2020