[SR-871] Benchmark runner misreports minimum values #43483

lorentey · 2016-03-03T19:59:37Z


Previous ID	SR-871
Radar	None
Original Reporter	@lorentey
Type	Bug
Status	Resolved
Resolution	Done

Attachment: Download

Additional Detail from JIRA


Votes	0
Component/s	Project Infrastructure
Labels	Bug, Benchmark, Performance
Assignee	@gottesmm
Priority	Medium

md5: c88582f5d81eeaf1671d9f61a060a88f

Issue Description:

While looking at some benchmarking anomalies for PR #1194, I noticed that benchmark results are sometimes logged with minimum values that are far greater than either the maximum or mean.

I've noticed these appearing on stdout before, but I assumed they were an artifact of the code that prints the results, and would not lead to miscalculated results. I was wrong.

I attached an example log containing numerous such mistakes. (I increased the number of samples to 25 to get more stable results; otherwise I haven't touched the benchmarking infrastructure.)

For example, look at the Histogram line in the snippet below:

33,HashTest,25,1700,1785,1731,0,1731,10512138
34,Histogram,25,1023,761,756,0,756,3915940
35,Integrate,25,244,253,248,0,248,3910697

Since benchmark comparisons compare minimum values only, this bug makes all results produced by the current version of the benchmark suite highly suspect. For example, here is a diff for the snippet above:

 33 HashTest                              1697             1700        +3     +0.2%     1.00x (?)
 34 Histogram                              730             1023      +293    +40.1%     0.71x
 35 Integrate                              242              244        +2     +0.8%     0.99x (?)

The text was updated successfully, but these errors were encountered:

jckarter · 2016-03-03T20:10:50Z

Any ideas, Michael?

lorentey · 2016-03-03T20:58:09Z

For what it's worth, I haven't yet seen such output when I run Benchmark_O directly.
It does occur a handful of times in every Benchmark_Driver-generated log file that I've looked at so far.

gottesmm · 2016-03-03T20:58:45Z

This is bizarre since the code that calculates the minimum is:

 var samples = [UInt64](count: c.numSamples, repeatedValue: 0)

  let sampler = SampleRunner()
  for s in 0..<c.numSamples {
    let time_per_sample: UInt64 = 1_000_000_000 * UInt64(c.iterationScale)

    var scale : UInt
    var elapsed_time : UInt64 = 0
    // Compute the scaling factor if a fixed c.fixedNumIters is not specified.
    scale = c.fixedNumIters

    // Rerun the test with the computed scale factor.
    elapsed_time = sampler.run(name, fn: fn, num_iters: scale)
    ...
    // save result in microseconds or k-ticks
    samples[s] = elapsed_time / UInt64(scale) / 1000
    ...
  }

  ...

  // Return our benchmark results.
  return BenchResults(delim: c.delim, sampleCount: UInt64(samples.count),
                      min: samples.minElement()!, max: samples.maxElement()!,
                      mean: mean, sd: sd, median: internalMedian(samples))
}

If the min element is greater than the maximum element then it would suggest that Array.minElement/Array.maxElement are incorrect.

EDITED: Formatted the pasted code.

gottesmm · 2016-03-03T21:09:48Z

It may be higher than the raw swift driver though. I will add some asserts that checks bounds to the swift driver. If it doesn't trigger it means that a script on top is misunderstanding the output of the low level driver.

lorentey · 2016-03-03T21:19:40Z

Gotcha! It's because Benchmark_Driver uses string comparison to calculate the minimum/maximum:

    avg_test_output[NUM_SAMPLES_INDEX] = num_samples
    avg_test_output[MIN_INDEX] = min(test_outputs,
                                     key=lambda x: x[MIN_INDEX])[MIN_INDEX]
    avg_test_output[MAX_INDEX] = max(test_outputs,
                                     key=lambda x: x[MAX_INDEX])[MAX_INDEX]
    avg_test_output = map(str, avg_test_output)

The cases I've seen confirm that a lexical comparison caused them.

lorentey · 2016-03-03T21:20:41Z

I'll create a pull request in a few moments. (My Python is a bit rusty.)

gottesmm · 2016-03-03T21:28:27Z

Simple unit testing of the script would have caught this bug. I filed:

https://bugs.swift.org/browse/SR-873

lorentey · 2016-03-03T21:38:20Z

Submitted fix as PR #1530

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-871] Benchmark runner misreports minimum values #43483

[SR-871] Benchmark runner misreports minimum values #43483

lorentey commented Mar 3, 2016

jckarter commented Mar 3, 2016

lorentey commented Mar 3, 2016

gottesmm commented Mar 3, 2016

gottesmm commented Mar 3, 2016

lorentey commented Mar 3, 2016

lorentey commented Mar 3, 2016

gottesmm commented Mar 3, 2016

lorentey commented Mar 3, 2016

[SR-871] Benchmark runner misreports minimum values #43483

[SR-871] Benchmark runner misreports minimum values #43483

Comments

lorentey commented Mar 3, 2016

jckarter commented Mar 3, 2016

lorentey commented Mar 3, 2016

gottesmm commented Mar 3, 2016

gottesmm commented Mar 3, 2016

lorentey commented Mar 3, 2016

lorentey commented Mar 3, 2016

gottesmm commented Mar 3, 2016

lorentey commented Mar 3, 2016