[SR-4600] Performance comparison should use MEAN and SD for analysis #47177
Labels
bug
A deviation from expected or documented behavior. Also: expected but undesirable behavior.
performance
Additional Detail from JIRA
md5: 006981c168e9f8486f59f9b3c2dddc48
is blocked by:
Issue Description:
The
compare_perf_tests.py
performs statistically questionable analysis to determine what regressions and improvements are significant. This results in high noise in the results, forcing reviewers to perform more judgment calls than necessary.In its current state,
compare_perf_tests.py
plays with MIN and MAX values to find some kind of significant performance change. But this is misguided. We are taking multiple samples of every performance test in order to eliminate one-off measurement aberrations (MIN, MAX) and getting to the truer value of MEAN. We have to use standard deviation (SD) to evaluate the difference between new and old values of MEAN represent meaningful improvements in performance of Swift.To be fair, MEAN and SD values were probably ignored because they were incorrectly generated by
Benchmark_Driver
. That is SR-4597.The text was updated successfully, but these errors were encountered: