New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-10909] Missed optimization for modulo #53299
Comments
My recollection is that _precondition does not imply an assume, but we should also be able to deduce this from the fact that the type is unsigned and earlier we have upperBound > something, IIRC. |
Yes, the whole modulo operation, where this optimization would come in is inside if m.low < upperBound {
…
} |
Would such optimization be in the realm of Swift, or is this already some LLVM stuff? |
It's up to Swift to make sure that the necessary information reaches the LLVM layer. It's up to LLVM to take advantage of it to do the optimization. So, both. |
Could you please ping me on a PR that fixes this? I'd love to learn how that's done. |
Is there a self-contained example that can go in this bug? Neither the precondition nor the `>` check are present in the linked assembly generation. |
The relevant method is in @inlinable
public mutating func next<T: FixedWidthInteger & UnsignedInteger>(
upperBound: T
) -> T {
_precondition(upperBound != 0, "upperBound cannot be zero.")
var random: T = next()
var m = random.multipliedFullWidth(by: upperBound)
if m.low < upperBound {
var t = 0 &- upperBound
if t >= upperBound {
t &-= upperBound
if t >= upperBound {
t %= upperBound
}
}
while m.low < t {
random = next()
m = random.multipliedFullWidth(by: upperBound)
}
}
return m.high
}
} |
Regular |
I suspect I have misinterpreted the situation. The Swift vs. C assembly comparison wasn't fair as I took a too narrow code snippet I suspected to be problematic and took out code that included important hints for the Swift compiler to perform optimization. Further, the original code was C++! I re-did the Godbolt Compiler Explorer comparison again: I believe the generated instructions are largely equivalent. Looking again at the original article by O'Neill, the modulo optimization provides a slight improvement on Large-Shuffle Benchmark, but on Small-Shuffle Benchmark the version with modulo optimization performs the same as Lemire's original version "Debiased Int Mult (t-opt)" in chart. Our benchmarks in SBS most closely correspond to small shuffle, which would explain why I saw no improvement from modulo optimization on our benchmarks. @stephentyrone I guess I should come up with a synthetic benchmark to mimic O'Neill's Large-Shuffle and we could re-introduce the modulo optimization together with your rejection sampling. Could you please review my analysis and close this bug as invalid, if you agree? |
I think the reason it's not optimized is because of the overflow checks. If you dump the llvm-ir, you can see that clang doesn't do this optimization itself, it just produces IR that LLVM can optimize. If you change the example a bit and remove the `>=` checks: https://swift.godbolt.org/z/Psgou5 you get the optimization. |
Additional Detail from JIRA
md5: d8c5b88b3f3e6a6164d9118e82dd5949
Issue Description:
In PR #25286 When applying O'Neill's modulo optimization to Lemire's Nearly Divisionless Random Number Generation, Swift does not produce the tight instructions that make it work in C. The original modulo:
O'Neill's optimization:
See the assembly in Godbolt's Compiler Explorer:
Swift version
the C version
Probable cause preventing better optimization is the guard against division by zero, but given that the method is already guarded by
it should be possible to eliminate it...
Edit: I suspect I have misinterpreted the situation. The above assembly comparison wasn't fair as I took a too narrow code snippet I suspected to be problematic and took out code that included important hints for the Swift compiler to perform optimization. See comments below.
The text was updated successfully, but these errors were encountered: