[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

karwa · 2021-12-20T08:07:44Z


Previous ID	SR-15624
Radar	rdar://problem/86754177
Original Reporter	@karwa
Type	Bug

Attachment: Download

Environment

Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)

Target: x86_64-apple-macosx11.0

Xcode 13.2

Additional Detail from JIRA


Votes	0
Component/s	Compiler
Labels	Bug
Assignee	None
Priority	Medium

md5: 59c0f16facc590c7ededabf32822573e

Issue Description:

I've been using the `@_semantics("optremark")` feature to try and optimise code-size. While doing so, I noticed something interesting. Consider these remarks in one function, which operates on generic Collections of UInt8:

As you can see, the optimiser is doing a ton of work, and a huge number of specializations get generated for all kinds of collection types - slices of ContiguousArray, UnsafeBufferPointer, UnsafeBoundsCheckedBufferPointer, WebURL.UTF8View, etc - and it tries to devirtualise all of these calls.

Now, what is really interesting is that almost none of these collection types actually enter this function. That's because, in the caller, we use withContiguousStorageIfAvailable to collapse all contiguous collections to UnsafeBufferPointer, then wrap that in an UnsafeBoundsCheckedBufferPointer. That's the only contiguous storage type which is ever used by this function.

So why is the optimiser doing all of this work?

Well, it turns out, the caller looks something like this:

return inputString.withContiguousStorageIfAvailable {
  _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
} ?? _urlFromBytes_impl(inputString, baseURL: baseURL, callback: &callback)

So - use the contiguous storage if possible, otherwise fall back to using the collection type itself.

What I believe is happening (what it looks like) is that this function gets constant-folded after generic specialisation, so the optimiser does a bunch of work specialising my entire library, devirtualising stuff, calculating whether things should be inlined, etc – and then just discards it all later.

We can test this. I added a protocol, "KnownContiguousStorage", which includes a "withUnsafeBufferPointer" method (unlike wCSIA, it does not return an optional - there is guaranteed contiguous storage), and conformed the types listed above to that protocol. Next, I added a dynamic downcast before using wCSIA:

if let buffer = inputString as? KnownContiguousStorage {
  return buffer._withUnsafeBufferPointer {
    _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
  }
}

return inputString.withContiguousStorageIfAvailable {
  _urlFromBytes_impl($0.boundsChecked, baseURL: baseURL, callback: &callback)
} ?? _urlFromBytes_impl(inputString, baseURL: baseURL, callback: &callback)

Now, let's look at what happens to the function mentioned at the start, at the same location:

Boom. We've gone from 26 remarks at this location down to 9. The optimiser no longer spends time generating specialisations which it will not use.

Let's take a look at that caller, and how its remarks have changed.

Before:

We see that the compiler recognises that it can devirtualise and specialise wCSIA, but still generates the specialisation in case UnsafeBufferPointer is not contiguous. Obviously, this will never be used.

After:

Now the compiler sees that UnsafeBufferPointer only has one path to follow, and that path results in it being wrapped in a UnsafeBoundsCheckedBufferPointer. As we've seen, it is thus able to do much less work.

It's worth pointing out that this doesn't affect binary size (as far as I can tell), although one would expect that it does impact compile time. It might be worth trying to make wCSIA inline(__always), so that the result can be used earlier by the optimiser and it can avoid optimising code that will certainly be culled later.

karwa · 2021-12-20T08:12:22Z

Ping @glessard @atrick

I think this may be of interest to one/both of you. I don't know whether this can be fixed in the standard library (perhaps making wCISA inline(__always)), or whether the optimiser can be a bit smarter about where it spends its time. But there are potentially compile-time savings here.

glessard · 2021-12-20T18:23:11Z

@swift-ci create

atrick · 2021-12-22T03:40:09Z

Thank you @karwa . That's an excellent writeup. I think the optimizer will need to work around this issue since `withContiguousStorage` is central to stdlib collections. It does raise the question of better ways to design customization hooks going forward. I'm not sure what that answer is yet.

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

karwa commented Dec 20, 2021

karwa commented Dec 20, 2021

glessard commented Dec 20, 2021

atrick commented Dec 22, 2021

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

[SR-15624] withContiguousStorageIfAvailable is not inlined early #57920

Comments

karwa commented Dec 20, 2021

karwa commented Dec 20, 2021

glessard commented Dec 20, 2021

atrick commented Dec 22, 2021