Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-4939] Generics code performance #47516

Open
dmcyk opened this issue May 19, 2017 · 8 comments
Open

[SR-4939] Generics code performance #47516

dmcyk opened this issue May 19, 2017 · 8 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance regression swift 4.0

Comments

@dmcyk
Copy link
Contributor

dmcyk commented May 19, 2017

Previous ID SR-4939
Radar None
Original Reporter @dmcyk
Type Bug

Attachment: Download

Environment

macOS Sierra 10.12.5, MacBook Air '13 2014
Swift 4.0 May 17, 2017 toolchain

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, 4.0Regression, Performance
Assignee None
Priority Medium

md5: 47607273ceb0ee887d1c89ce8dd971a2

Issue Description:

Trying new improved integer protocols in Swift 4 I wanted to change some code to generic format, to my surprise I found out there was huge performance decrease when using generic implementation.

In the attachment there's more or less one-to-one implementation of what I called BitStore - a container for binary values encoded on top of array of integers and their bit values.

In `BitStorePerf` target there's an example showing the performance difference, the generic implementation using Int type compared to just normal Int implementation is up to 3 times slower in debug build, when building for release it can be even up to 20 times slower, even though the methods are marked with `specialize` attribute.

*Not sure if that's really a bug or simply trade off of using generics, but that's quite a difference so I thought maybe I should report it. *

@belkadan
Copy link
Contributor

@bob-wilson, @ematejska, who do you think should look at this?

@bob-wilson
Copy link

aschwaighofer@apple.com (JIRA User) could you look into this?

@aschwaighofer
Copy link
Member

Today this performance is expected. The benchmark calls the generic/non-generic code across a framework boundary. We don't get the benefit of full specialization. Then the following happens:

IntBitStore.subscript.setter the non generic implementation is fast. Because we know that we are dealing with ints the code boils down to mutating an Array<Int>. The code path is fully inlined (we know that we are not dealing with an NSArray backing so that code can be statically compiled out). There is not retain/release traffic and only a call to isUniquelyReference runtime function.

BitStore.subscript.setter the generic implementation
1 issue: User error, the setter function was not annotated so we did not call a specialized entry point (not that this would have helped much see other points)

   public subscript(index: Int) -> Bool {
// Was missing
==>        @_specialize(exported: true, where StoreType: _Trivial(32))
==>        @_specialize(exported: true, where StoreType: _Trivial(64))
        set {
            precondition(index >= 0)
            precondition(index < capacity)
            let rawIndex = index / _intBitCapacity
            let current = index - (rawIndex * _intBitCapacity)
            let _buff = rawBuff[rawIndex]
            if newValue {
                rawBuff[rawIndex] = _buff | (1 << current)
            } else {
                rawBuff[rawIndex] = _buff & ~(1 << current)
            }
        }

2. After fixing this performance does not improve even though we call the specialize version to T where T is _Trivial(64)

  • We still have our indirect representation of generic types
    This means that we have witness method calls that the optimizer can't reason about. Therefore there are retain/release on array<T>

  • We call unspecialized array<T> entry points (more retain/release traffic)

  • The optimizer has not been taught that _swift_isClassOrObjCExistentialType<T> where T is _Trivial as specified in this context is known to be false.

I believe after moving to a opaque value ssa representation, specializing Array<T> calls in the implementation, inlining these specialized calls we should get performance that is a lot better.

But there are limits:

Getting the bitwidth will stay a witness method call.

sil @_T08BitStoreAAV9subscriptSbSicfsSbSiAByxGRlze63_s17FixedWidthIntegerRzlItMyyl_Tp5 : $@convention(method) <τ_0_0 where τ_0_0 : _Trivial(64), τ_0_0 : FixedWidthInteger> (Bool, Int, @inout BitStore<τ_0_0>) -> () {

 %127 = witness_method $τ_0_0, #FixedWidthInteger.bitWidth!getter.1 : <Self where Self : FixedWidthInteger> (Self.Type) -> () -> Int : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // users: %281, %266, %242, %197, %168, %148, %128
  %128 = apply %127<τ_0_0>(%48) : $@convention(witness_method) <τ_0_0 where τ_0_0 : FixedWidthInteger> (@thick τ_0_0.Type) -> Int // user: %129

}

@belkadan
Copy link
Contributor

_specialize isn't something available to users at this point.

@dmcyk
Copy link
Contributor Author

dmcyk commented May 19, 2017

Thanks aschwaighofer@apple.com (JIRA User), I wasn't actually aware that subscript could be attributed so.

Using BitStore from within same module does indeed have similar performance to IntBitStore when build for release.
Having checked that though, I noticed some odd behaviour. Using IntBitStore (non-generic) from within one binary is on average ~50% faster compared to using it from a separate framework.

var intBinbuff = IntBitStore(capacity: size)

var results: [Double] = []

for _ in 0 ..< 10 {
    let elapsed = Utilities.measureTime {
        for i in 0 ..< size {
            intBinbuff[i] = true
        }
    }
    
    results.append(elapsed)
}
print(results.reduce(0, +) / 10)

I increased the size to 50000000 and for 10 repeats for instance in case of one binary it took on average 0.6997 sec per run, while using it from separate framework took on average 1.093 sec.
that's quite a significant difference isn't it?

@aschwaighofer
Copy link
Member

Very likely that because we inlined the call to intBinbuff[i] we could hoist the uniqueness check of the array buffer in the implementation of this function out of the inner loop. This would explain 50% performance difference. If you look at the profile you would see a call to isUniquelyReferenced that is not in the inner loop doing the intBinbuff[i] = true assignment.

@aschwaighofer
Copy link
Member

And as Jordan said: This is a unsupported feature and syntax that might stop working, be removed or cause some Cat to turn into a Dog at anytime.

@dmcyk
Copy link
Contributor Author

dmcyk commented May 19, 2017

Ah, makes sense. Kinda tricky for the non-generic IntBitStore though, guess I better use everything from within one module next time I have computations heavy task. Thanks!

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance regression swift 4.0
Projects
None yet
Development

No branches or pull requests

5 participants