New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-4939] Generics code performance #47516
Comments
@bob-wilson, @ematejska, who do you think should look at this? |
aschwaighofer@apple.com (JIRA User) could you look into this? |
Today this performance is expected. The benchmark calls the generic/non-generic code across a framework boundary. We don't get the benefit of full specialization. Then the following happens: IntBitStore.subscript.setter the non generic implementation is fast. Because we know that we are dealing with ints the code boils down to mutating an Array<Int>. The code path is fully inlined (we know that we are not dealing with an NSArray backing so that code can be statically compiled out). There is not retain/release traffic and only a call to isUniquelyReference runtime function. BitStore.subscript.setter the generic implementation public subscript(index: Int) -> Bool {
// Was missing
==> @_specialize(exported: true, where StoreType: _Trivial(32))
==> @_specialize(exported: true, where StoreType: _Trivial(64))
set {
precondition(index >= 0)
precondition(index < capacity)
let rawIndex = index / _intBitCapacity
let current = index - (rawIndex * _intBitCapacity)
let _buff = rawBuff[rawIndex]
if newValue {
rawBuff[rawIndex] = _buff | (1 << current)
} else {
rawBuff[rawIndex] = _buff & ~(1 << current)
}
} 2. After fixing this performance does not improve even though we call the specialize version to T where T is _Trivial(64)
I believe after moving to a opaque value ssa representation, specializing Array<T> calls in the implementation, inlining these specialized calls we should get performance that is a lot better. But there are limits: Getting the bitwidth will stay a witness method call. sil @_T08BitStoreAAV9subscriptSbSicfsSbSiAByxGRlze63_s17FixedWidthIntegerRzlItMyyl_Tp5 : $@convention(method) <τ_0_0 where τ_0_0 : _Trivial(64), τ_0_0 : FixedWidthInteger> (Bool, Int, @inout BitStore<τ_0_0>) -> () {
%127 = witness_method $τ_0_0, #FixedWidthInteger.bitWidth!getter.1 : <Self where Self : FixedWidthInteger> (Self.Type) -> () -> Int : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // users: %281, %266, %242, %197, %168, %148, %128
%128 = apply %127<τ_0_0>(%48) : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // user: %129
} |
|
Thanks aschwaighofer@apple.com (JIRA User), I wasn't actually aware that subscript could be attributed so. Using BitStore from within same module does indeed have similar performance to IntBitStore when build for release. var intBinbuff = IntBitStore(capacity: size)
var results: [Double] = []
for _ in 0 ..< 10 {
let elapsed = Utilities.measureTime {
for i in 0 ..< size {
intBinbuff[i] = true
}
}
results.append(elapsed)
}
print(results.reduce(0, +) / 10) I increased the size to 50000000 and for 10 repeats for instance in case of one binary it took on average 0.6997 sec per run, while using it from separate framework took on average 1.093 sec. |
Very likely that because we inlined the call to intBinbuff[i] we could hoist the uniqueness check of the array buffer in the implementation of this function out of the inner loop. This would explain 50% performance difference. If you look at the profile you would see a call to isUniquelyReferenced that is not in the inner loop doing the intBinbuff[i] = true assignment. |
And as Jordan said: This is a unsupported feature and syntax that might stop working, be removed or cause some Cat to turn into a Dog at anytime. |
Ah, makes sense. Kinda tricky for the non-generic IntBitStore though, guess I better use everything from within one module next time I have computations heavy task. Thanks! |
Attachment: Download
Environment
macOS Sierra 10.12.5, MacBook Air '13 2014
Swift 4.0 May 17, 2017 toolchain
Additional Detail from JIRA
md5: 47607273ceb0ee887d1c89ce8dd971a2
Issue Description:
Trying new improved integer protocols in Swift 4 I wanted to change some code to generic format, to my surprise I found out there was huge performance decrease when using generic implementation.
In the attachment there's more or less one-to-one implementation of what I called BitStore - a container for binary values encoded on top of array of integers and their bit values.
In `BitStorePerf` target there's an example showing the performance difference, the generic implementation using Int type compared to just normal Int implementation is up to 3 times slower in debug build, when building for release it can be even up to 20 times slower, even though the methods are marked with `specialize` attribute.
*Not sure if that's really a bug or simply trade off of using generics, but that's quite a difference so I thought maybe I should report it. *
The text was updated successfully, but these errors were encountered: