[SR-4600] Performance comparison should use MEAN and SD for analysis #47177

palimondo · 2017-04-15T09:13:37Z


Previous ID	SR-4600
Radar	None
Original Reporter	@palimondo
Type	Bug
Status	In Progress
Resolution

Additional Detail from JIRA


Votes	0
Component/s	Project Infrastructure
Labels	Bug, Performance
Assignee	@palimondo
Priority	Medium

md5: 006981c168e9f8486f59f9b3c2dddc48

is blocked by:

SR-4597 Benchmark results have wrong MEAN, MEDIAN and SD

Issue Description:

The compare_perf_tests.py performs statistically questionable analysis to determine what regressions and improvements are significant. This results in high noise in the results, forcing reviewers to perform more judgment calls than necessary.

In its current state, compare_perf_tests.py plays with MIN and MAX values to find some kind of significant performance change. But this is misguided. We are taking multiple samples of every performance test in order to eliminate one-off measurement aberrations (MIN, MAX) and getting to the truer value of MEAN. We have to use standard deviation (SD) to evaluate the difference between new and old values of MEAN represent meaningful improvements in performance of Swift.

To be fair, MEAN and SD values were probably ignored because they were incorrectly generated by Benchmark_Driver. That is SR-4597.

The text was updated successfully, but these errors were encountered:

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-4600] Performance comparison should use MEAN and SD for analysis #47177

[SR-4600] Performance comparison should use MEAN and SD for analysis #47177

palimondo mannequin commented Apr 15, 2017

[SR-4600] Performance comparison should use MEAN and SD for analysis #47177

[SR-4600] Performance comparison should use MEAN and SD for analysis #47177

Comments

palimondo mannequin commented Apr 15, 2017