New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-7541] Closure not optimized by closure specializer #50083
Comments
cc @atrick |
It looks to me like an allocation is needed for the This code pattern normally wouldn't be a performance issue, because closure taking |
@atrick I agree with everything you say. My main question was also why it isn't inlined. Oddly enough, if you remove the |
Close'ish ... The reason why the allocation for the context (for the partial_apply) goes away with a class is the closure captures self: bb0(%0 : $Builtin.Word, %1 : $Builtin.Word, %2 : $Builtin.Int8, %3 : $*SadByteBuffer):
%6 = partial_apply [callee_guaranteed] %5(%3) If self is a struct we capture the pointer to the struct. Swift needs to allocate a reference counted heap structure for the closure context to hold that pointer so that it can release and retain the context object. If self is a class reference we capture the class reference. Swift optimizes this case: It can reuse the class reference as the reference counted context instead of allocating a heap object and storing the reference to it. |
aschwaighofer@apple.com (JIRA User) thank you, that makes sense! But shouldn't that go away when `withUTF8Buffer` is inlined? And if yes, why isn't `withUTF8Buffer` inlined? |
Presumably the inliner is conservative because of code size concerns and lack of |
@atrick should we just use this as the bug? I'm happy for you to edit the title and body |
'withUTF8Buffer' is quite big (26 blocks and 180ish instructions) so I would not be surprised if the inliner chose not to inline. However, what I would have expected to be happening is that the closure specializer would create a specialized version of `withUTF8Buffer` with the closure argument inlined. That is also not happening. I am going to take a quick look why. |
aschwaighofer@apple.com (JIRA User) thanks! I ruled out the 'withUTF8Buffer is too big' because if you remove the |
The problem is that there is a re-abstraction thunk closure interposed. The closure specializer will remove the thunk. But then it won't run again on the same function: sil [noinline] @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitF : $@convention(method) (StaticString, Int, @inout SadByteBuffer) -> Int {
// %0 // users: %12, %3
// %2 // users: %15, %6, %4
bb0(%0 : $StaticString, %1 : $Int, %2 : $*SadByteBuffer):
debug_value %0 : $StaticString, let, name "string", argno 1 // id: %3
debug_value_addr %2 : $*SadByteBuffer, var, name "self", argno 3 // id: %4
// function_ref closure #​1 in SadByteBuffer.set(staticString:at:)
%5 = function_ref @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitFSiSRys5UInt8VGXEfU_ : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // user: %6
%6 = partial_apply [callee_guaranteed] %5(%2) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // users: %14, %7
%7 = convert_escape_to_noescape %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // user: %9
// function_ref thunk for @callee_guaranteed (@unowned UnsafeBufferPointer<UInt8>) -> (@unowned Int)
%8 = function_ref @$SSRys5UInt8VGSiIgyd_ACSiIegyr_TR : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int // user: %9
%9 = partial_apply [callee_guaranteed] %8(%7) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int // users: %13, %10
%10 = convert_escape_to_noescape %9 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int // user: %12
// function_ref specialized StaticString.withUTF8Buffer<A>(_:)
%11 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg5 : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int // user: %12
%12 = apply %11(%10, %0) : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int // users: %17, %16
strong_release %9 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int // id: %13
strong_release %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // id: %14
%15 = struct_element_addr %2 : $*SadByteBuffer, #SadByteBuffer.x // user: %16
store %12 to %15 : $*Int // id: %16
return %12 : $Int // id: %17
} *** SIL function after #​5850, stage ClosureSpecialize, pass 5: ClosureSpecializer (closure-specialize)
// SadByteBuffer.set(staticString:at:)
sil [noinline] @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitF : $@convention(method) (StaticString, Int, @inout SadByteBuffer) -> Int {
// %0 // users: %11, %3
// %2 // users: %13, %6, %4
bb0(%0 : $StaticString, %1 : $Int, %2 : $*SadByteBuffer):
debug_value %0 : $StaticString, let, name "string", argno 1 // id: %3
debug_value_addr %2 : $*SadByteBuffer, var, name "self", argno 3 // id: %4
// function_ref closure #​1 in SadByteBuffer.set(staticString:at:)
%5 = function_ref @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitFSiSRys5UInt8VGXEfU_ : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // user: %6
%6 = partial_apply [callee_guaranteed] %5(%2) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // users: %12, %7
%7 = convert_escape_to_noescape %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // user: %11
// function_ref thunk for @callee_guaranteed (@unowned UnsafeBufferPointer<UInt8>) -> (@unowned Int)
%8 = function_ref @$SSRys5UInt8VGSiIgyd_ACSiIegyr_TR : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int
// function_ref specialized StaticString.withUTF8Buffer<A>(_:)
%9 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg507$SSRys5F21VGSiIgyd_ACSiIegyr_TRAFSiIgyd_Tf1cn_n : $@convention(thin) (StaticString, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // user: %11
// function_ref specialized StaticString.withUTF8Buffer<A>(_:)
%10 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg5 : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int
%11 = apply %9(%0, %7) : $@convention(thin) (StaticString, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // users: %14, %15
strong_release %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // id: %12
%13 = struct_element_addr %2 : $*SadByteBuffer, #SadByteBuffer.x // user: %14
store %11 to %13 : $*Int // id: %14
return %11 : $Int // id: %15
} The closure specializer does not handle partial_apply of partial_apply (it does not concatenate closures) . And neither does it have a special handling for this situation (it could just treat the two partial_applies as a unit and inline them together). Furthermore, we only run the ClosureOptimizer once in our pipeline so no fix-point effect either. |
Thanks aschwaighofer@apple.com (JIRA User) |
👍 nice one, thanks aschwaighofer@apple.com (JIRA User) |
Attachment: Download
Additional Detail from JIRA
md5: 1f34b9b67b95bffeb77cf993672a5261
Issue Description:
This program
in my mind shouldn't allocate for each call to
y.set
but it does. Theself
capture withinseems to cause it to allocate.
Oddly enough
withUTF8Buffer
doesn't seem to be inlined intoset
and I don't really understand why. Interestingly, if you make theSadByteBuffer
aclass
the allocation goes away! @kevints suggested it might be an artefact of the exclusivity checking which sounds plausible.Regarding the
@inline(never)
that I have in there: This is just to simulate a cross-module call which is where I actually saw this. But I don't understand why this needs to be inlined, theset
function doesn't use any generics or closures so it should just work fine without.This is the SIL btw:
and here the IR
note the
this is the test I run to show the number of allocations
which outputs
at the end so one allocation there per loop iteration 🙁.
and btw
but behaves the same in 4.1
The text was updated successfully, but these errors were encountered: