You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)
Target: x86_64-apple-macosx11.0
Xcode 13.2
Additional Detail from JIRA
Votes
0
Component/s
Compiler
Labels
Bug
Assignee
None
Priority
Medium
md5: 59c0f16facc590c7ededabf32822573e
Issue Description:
I've been using the `@_semantics("optremark")` feature to try and optimise code-size. While doing so, I noticed something interesting. Consider these remarks in one function, which operates on generic Collections of UInt8:
As you can see, the optimiser is doing a ton of work, and a huge number of specializations get generated for all kinds of collection types - slices of ContiguousArray, UnsafeBufferPointer, UnsafeBoundsCheckedBufferPointer, WebURL.UTF8View, etc - and it tries to devirtualise all of these calls.
Now, what is really interesting is that almost none of these collection types actually enter this function. That's because, in the caller, we use withContiguousStorageIfAvailable to collapse all contiguous collections to UnsafeBufferPointer, then wrap that in an UnsafeBoundsCheckedBufferPointer. That's the only contiguous storage type which is ever used by this function.
So why is the optimiser doing all of this work?
Well, it turns out, the caller looks something like this:
So - use the contiguous storage if possible, otherwise fall back to using the collection type itself.
What I believe is happening (what it looks like) is that this function gets constant-folded after generic specialisation, so the optimiser does a bunch of work specialising my entire library, devirtualising stuff, calculating whether things should be inlined, etc – and then just discards it all later.
We can test this. I added a protocol, "KnownContiguousStorage", which includes a "withUnsafeBufferPointer" method (unlike wCSIA, it does not return an optional - there is guaranteed contiguous storage), and conformed the types listed above to that protocol. Next, I added a dynamic downcast before using wCSIA:
Now, let's look at what happens to the function mentioned at the start, at the same location:
Boom. We've gone from 26 remarks at this location down to 9. The optimiser no longer spends time generating specialisations which it will not use.
Let's take a look at that caller, and how its remarks have changed.
Before:
We see that the compiler recognises that it can devirtualise and specialise wCSIA, but still generates the specialisation in case UnsafeBufferPointer is not contiguous. Obviously, this will never be used.
After:
Now the compiler sees that UnsafeBufferPointer only has one path to follow, and that path results in it being wrapped in a UnsafeBoundsCheckedBufferPointer. As we've seen, it is thus able to do much less work.
It's worth pointing out that this doesn't affect binary size (as far as I can tell), although one would expect that it does impact compile time. It might be worth trying to make wCSIA inline(__always), so that the result can be used earlier by the optimiser and it can avoid optimising code that will certainly be culled later.
The text was updated successfully, but these errors were encountered:
I think this may be of interest to one/both of you. I don't know whether this can be fixed in the standard library (perhaps making wCISA inline(__always)), or whether the optimiser can be a bit smarter about where it spends its time. But there are potentially compile-time savings here.
Thank you @karwa . That's an excellent writeup. I think the optimizer will need to work around this issue since `withContiguousStorage` is central to stdlib collections. It does raise the question of better ways to design customization hooks going forward. I'm not sure what that answer is yet.
Attachment: Download
Environment
Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)
Target: x86_64-apple-macosx11.0
Xcode 13.2
Additional Detail from JIRA
md5: 59c0f16facc590c7ededabf32822573e
Issue Description:
I've been using the `@_semantics("optremark")` feature to try and optimise code-size. While doing so, I noticed something interesting. Consider these remarks in one function, which operates on generic Collections of UInt8:
As you can see, the optimiser is doing a ton of work, and a huge number of specializations get generated for all kinds of collection types - slices of ContiguousArray, UnsafeBufferPointer, UnsafeBoundsCheckedBufferPointer, WebURL.UTF8View, etc - and it tries to devirtualise all of these calls.
Now, what is really interesting is that almost none of these collection types actually enter this function. That's because, in the caller, we use withContiguousStorageIfAvailable to collapse all contiguous collections to UnsafeBufferPointer, then wrap that in an UnsafeBoundsCheckedBufferPointer. That's the only contiguous storage type which is ever used by this function.
So why is the optimiser doing all of this work?
Well, it turns out, the caller looks something like this:
So - use the contiguous storage if possible, otherwise fall back to using the collection type itself.
What I believe is happening (what it looks like) is that this function gets constant-folded after generic specialisation, so the optimiser does a bunch of work specialising my entire library, devirtualising stuff, calculating whether things should be inlined, etc – and then just discards it all later.
We can test this. I added a protocol, "KnownContiguousStorage", which includes a "withUnsafeBufferPointer" method (unlike wCSIA, it does not return an optional - there is guaranteed contiguous storage), and conformed the types listed above to that protocol. Next, I added a dynamic downcast before using wCSIA:
Now, let's look at what happens to the function mentioned at the start, at the same location:
Boom. We've gone from 26 remarks at this location down to 9. The optimiser no longer spends time generating specialisations which it will not use.
Let's take a look at that caller, and how its remarks have changed.
Before:
We see that the compiler recognises that it can devirtualise and specialise wCSIA, but still generates the specialisation in case UnsafeBufferPointer is not contiguous. Obviously, this will never be used.
After:
Now the compiler sees that UnsafeBufferPointer only has one path to follow, and that path results in it being wrapped in a UnsafeBoundsCheckedBufferPointer. As we've seen, it is thus able to do much less work.
It's worth pointing out that this doesn't affect binary size (as far as I can tell), although one would expect that it does impact compile time. It might be worth trying to make wCSIA inline(__always), so that the result can be used earlier by the optimiser and it can avoid optimising code that will certainly be culled later.
The text was updated successfully, but these errors were encountered: