[SR-4939] Generics code performance #47516

dmcyk · 2017-05-19T16:08:37Z


Previous ID	SR-4939
Radar	None
Original Reporter	@dmcyk
Type	Bug

Attachment: Download

Environment

macOS Sierra 10.12.5, MacBook Air '13 2014
Swift 4.0 May 17, 2017 toolchain

Additional Detail from JIRA


Votes	0
Component/s	Compiler
Labels	Bug, 4.0Regression, Performance
Assignee	None
Priority	Medium

md5: 47607273ceb0ee887d1c89ce8dd971a2

Issue Description:

Trying new improved integer protocols in Swift 4 I wanted to change some code to generic format, to my surprise I found out there was huge performance decrease when using generic implementation.

In the attachment there's more or less one-to-one implementation of what I called BitStore - a container for binary values encoded on top of array of integers and their bit values.

In `BitStorePerf` target there's an example showing the performance difference, the generic implementation using Int type compared to just normal Int implementation is up to 3 times slower in debug build, when building for release it can be even up to 20 times slower, even though the methods are marked with `specialize` attribute.

*Not sure if that's really a bug or simply trade off of using generics, but that's quite a difference so I thought maybe I should report it. *

belkadan · 2017-05-19T16:21:15Z

@bob-wilson, @ematejska, who do you think should look at this?

bob-wilson · 2017-05-19T16:50:28Z

aschwaighofer@apple.com (JIRA User) could you look into this?

aschwaighofer · 2017-05-19T18:29:54Z

Today this performance is expected. The benchmark calls the generic/non-generic code across a framework boundary. We don't get the benefit of full specialization. Then the following happens:

IntBitStore.subscript.setter the non generic implementation is fast. Because we know that we are dealing with ints the code boils down to mutating an Array<Int>. The code path is fully inlined (we know that we are not dealing with an NSArray backing so that code can be statically compiled out). There is not retain/release traffic and only a call to isUniquelyReference runtime function.

BitStore.subscript.setter the generic implementation
1 issue: User error, the setter function was not annotated so we did not call a specialized entry point (not that this would have helped much see other points)

   public subscript(index: Int) -> Bool {
// Was missing
==>        @_specialize(exported: true, where StoreType: _Trivial(32))
==>        @_specialize(exported: true, where StoreType: _Trivial(64))
        set {
            precondition(index >= 0)
            precondition(index < capacity)
            let rawIndex = index / _intBitCapacity
            let current = index - (rawIndex * _intBitCapacity)
            let _buff = rawBuff[rawIndex]
            if newValue {
                rawBuff[rawIndex] = _buff | (1 << current)
            } else {
                rawBuff[rawIndex] = _buff & ~(1 << current)
            }
        }

2. After fixing this performance does not improve even though we call the specialize version to T where T is _Trivial(64)

We still have our indirect representation of generic types
This means that we have witness method calls that the optimizer can't reason about. Therefore there are retain/release on array<T>
We call unspecialized array<T> entry points (more retain/release traffic)
The optimizer has not been taught that _swift_isClassOrObjCExistentialType<T> where T is _Trivial as specified in this context is known to be false.

I believe after moving to a opaque value ssa representation, specializing Array<T> calls in the implementation, inlining these specialized calls we should get performance that is a lot better.

But there are limits:

Getting the bitwidth will stay a witness method call.

sil @_T08BitStoreAAV9subscriptSbSicfsSbSiAByxGRlze63_s17FixedWidthIntegerRzlItMyyl_Tp5 : $@convention(method) <τ_0_0 where τ_0_0 : _Trivial(64), τ_0_0 : FixedWidthInteger> (Bool, Int, @inout BitStore<τ_0_0>) -> () {

 %127 = witness_method $τ_0_0, #FixedWidthInteger.bitWidth!getter.1 : <Self where Self : FixedWidthInteger> (Self.Type) -> () -> Int : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // users: %281, %266, %242, %197, %168, %148, %128
  %128 = apply %127<τ_0_0>(%48) : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // user: %129

}

belkadan · 2017-05-19T18:35:05Z

_specialize isn't something available to users at this point.

dmcyk · 2017-05-19T19:45:54Z

Thanks aschwaighofer@apple.com (JIRA User), I wasn't actually aware that subscript could be attributed so.

Using BitStore from within same module does indeed have similar performance to IntBitStore when build for release.
Having checked that though, I noticed some odd behaviour. Using IntBitStore (non-generic) from within one binary is on average ~50% faster compared to using it from a separate framework.

var intBinbuff = IntBitStore(capacity: size)

var results: [Double] = []

for _ in 0 ..< 10 {
    let elapsed = Utilities.measureTime {
        for i in 0 ..< size {
            intBinbuff[i] = true
        }
    }
    
    results.append(elapsed)
}
print(results.reduce(0, +) / 10)

I increased the size to 50000000 and for 10 repeats for instance in case of one binary it took on average 0.6997 sec per run, while using it from separate framework took on average 1.093 sec.
that's quite a significant difference isn't it?

aschwaighofer · 2017-05-19T20:01:02Z

Very likely that because we inlined the call to intBinbuff[i] we could hoist the uniqueness check of the array buffer in the implementation of this function out of the inner loop. This would explain 50% performance difference. If you look at the profile you would see a call to isUniquelyReferenced that is not in the inner loop doing the intBinbuff[i] = true assignment.

aschwaighofer · 2017-05-19T20:04:17Z

And as Jordan said: This is a unsupported feature and syntax that might stop working, be removed or cause some Cat to turn into a Dog at anytime.

dmcyk · 2017-05-19T20:15:20Z

Ah, makes sense. Kinda tricky for the non-generic IntBitStore though, guess I better use everything from within one module next time I have computations heavy task. Thanks!

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

AnthonyLatsis added swift 4.0 regression and removed 4.0 regression labels Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-4939] Generics code performance #47516

[SR-4939] Generics code performance #47516

dmcyk commented May 19, 2017

belkadan commented May 19, 2017

bob-wilson commented May 19, 2017

aschwaighofer commented May 19, 2017

belkadan commented May 19, 2017

dmcyk commented May 19, 2017

aschwaighofer commented May 19, 2017

aschwaighofer commented May 19, 2017

dmcyk commented May 19, 2017

[SR-4939] Generics code performance #47516

[SR-4939] Generics code performance #47516

Comments

dmcyk commented May 19, 2017

belkadan commented May 19, 2017

bob-wilson commented May 19, 2017

aschwaighofer commented May 19, 2017

belkadan commented May 19, 2017

dmcyk commented May 19, 2017

aschwaighofer commented May 19, 2017

aschwaighofer commented May 19, 2017

dmcyk commented May 19, 2017