New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-7150] Recent dev snapshots make this code 4 times slower #49698
Comments
Here's what I did: Confirmed the regression from my installed Swift 4.1 to a recent build of master. Ran each compiled binary in Instruments to compare. There's nothing Looked at the binaries in Hopper. Identified one of the slow functions by looking at main's disassembly: _$S4test9benchmark6source11destinationyxm_q_mts17FixedWidthIntegerRzsAER_r0_lFs5UInt8V_s5Int64VTg5Tf4dd_n Run swift-demangle ---> function signature specialization <Arg[0] = Dead, Arg[1] = Dead> of generic specialization <Swift.UInt8, Swift.Int64> of test.benchmark<A, B where A: Swift.FixedWidthInteger, B: Swift.FixedWidthInteger>(source: A.Type, destination: B.Type) -> () Yep, that looks like the slow one. Search for that symbol in Hopper. The fast code has a single block vectorized loop, 4 wide. (coincidence?) The slow code has a bunch of compare-and-branches. Create a reduced benchmark, because I don't want to look at a ton of I created a fully specialized version using UnsafeBufferPointer. That I went back to a somewhat generic version that still performs well: @inline(never)
func foo(srcBuf: Buffer<UInt8>, dstBuf: Buffer<Int64>) {
dstBuf.setElements(from: srcBuf, transform: Int64.init(rangeConverted:))
}
func benchmark8To64() {
let num = 100_000_000
print("\nRange converting \(num) values from \(UInt8.self) to \(Int64.self):")
let srcBuf = Buffer<UInt8>(count: num)
let dstBuf = Buffer<Int64>(count: num)
srcBuf.setAllElementsToRandomBitPatterns()
for _ in 0 ..< 10 {
foo(srcBuf: srcBuf, dstBuf: dstBuf)
}
}
benchmark8To64() I grabbed the -frontend command line with
Confirmed that the fast code looks like this (vectorized): vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%36 = getelementptr inbounds %Ts5UInt8V, %Ts5UInt8V* %32, i64 %index, i32 0
%37 = bitcast i8* %36 to <2 x i8>*
%wide.load = load <2 x i8>, <2 x i8>* %37, align 1, !alias.scope !35 Now, disable LLVM optimizations to see what the swift compiler is giving it:
Copy-paste function "foo", which has only the code that I care about,
In the fast code, I can see branch that skips over one of the blocks: ; <label>:75: ; preds = %70
br label %52
; <label>:49: ; preds = %47, %65, %68, %72
%50 = phi i64 [ %74, %72 ], [ %69, %68 ], [ %67, %65 ], [ %48, %47 ]
%51 = or i64 %40, %50
br label %52
; <label>:52: ; preds = %75, %49
%53 = phi i64 [ %51, %49 ], [ %40, %75 ]
%54 = shl i64 %39, 1
%55 = icmp slt i64 %54, 64
br i1 %55, label %56, label %57
; <label>:57: ; preds = %52
%58 = icmp eq i64 %10, 0
br i1 %58, label %89, label %59 In the slow code ; <label>:73: ; preds = %68
br label %49
; <label>:49: ; preds = %47, %63, %66, %73, %70
%50 = phi i64 [ %72, %70 ], [ 0, %73 ], [ %67, %66 ], [ %65, %63 ], [ %48, %47 ]
%51 = or i64 %40, %50
%52 = shl i64 %39, 1
%53 = icmp slt i64 %52, 64
br i1 %53, label %54, label %55
; <label>:55: ; preds = %49
%56 = icmp eq i64 %10, 0
br i1 %56, label %87, label %57 That's really the only difference. It looks like the while shifts < dstBW {
self |= self << shifts
shifts = shifts << 1
} Rerun the It's all coming from
As a workaround, maybe you want to use a masking shift self |= self &<< shifts It would be great to understand why the performance is fragile. It I can't follow the stdlib code, and it would take a lot more debugging public static func _nonMaskingLeftShift(_ lhs: Self, _ rhs: Int) -> Self { The closest commit I can find in the past year is this, but it's just a wild guess: commit a5ff35cd41a8199cd94ab43a4f093a889ba8fdad
Author: Maxim Moiseev <moiseev@apple.com>
Date: Fri Jul 21 14:46:54 2017 -0700
[stdlib] extendingOrTruncating: => truncatingIfNeeded: A separate question is why LLVM was able to vectorize one form of the |
Thanks, very informative and appreciated! Interesting that using &<< and &>> works around the issue. Btw, even though it probably doesn't add any valuable information: I just installed Xcode 9.3 beta 4 and its default toolchain is not reproducing the issue (just like the default toolchains of 9.2 and all previous 9.3 betas). Not sure what version of the compiler went into 9.3 beta 4 but it was released mars 5, and yet I guess it must have some version before dev snapshot 2018-02-25 (which does reproduce the issue). swiftc --version of Xcode 9.3 beta 4's default toolchain:
swiftc --version of dev snapshot 2018-02-25 (earliest snapshot I have access to):
Is there perhaps some way of converting between the two different Swift (build?) version formats (if that's what they are) that are given by Xcode default toolchains and dev snapshots? That is:
and
|
I noticed some other things: I hadn't actually tested this on Xcode 9.2 as I've erroneously written, I thought I had. But the program won't even compile with Xcode 9.2 since the syntax for UnsafeMutableBufferPointer et al has changed since then. When I adjusted the code for this, it turns out that the issue can be reproduced with Xcode 9.2. So it's only the default toolchains of Xcode 9.3 betas that doesn't reproduce the issue. It turns out that while using only masking shifts instead of only smart shifts works around the issue, if I instead use a certain combinations of both masking- and smart shifts, It is possible to reproduce the issue on the otherwise unaffected default toolchain of Xcode 9.3 beta 4 (and probably the earlier betas) too:
|
That's all useful info. It also confirms that this isn't exactly a regression so much as general performance instability. |
There's no direct way to translate an Xcode swift root to a github revision. I can tell you that swiftlang-902.0.43 roughly has everything that went in by Feb 21. |
Attachment: Download
Additional Detail from JIRA
md5: 5caab9c0d3e83a66d11a891bfae1db8b
relates to:
Issue Description:
This is the test.swift program in the attachment:
The text was updated successfully, but these errors were encountered: