Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-7541] Closure not optimized by closure specializer #50083

Open
weissi opened this issue Apr 26, 2018 · 13 comments
Open

[SR-7541] Closure not optimized by closure specializer #50083

weissi opened this issue Apr 26, 2018 · 13 comments
Assignees
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself

Comments

@weissi
Copy link
Member

weissi commented Apr 26, 2018

Previous ID SR-7541
Radar None
Original Reporter @weissi
Type Bug

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug
Assignee @aschwaighofer
Priority Medium

md5: 1f34b9b67b95bffeb77cf993672a5261

Issue Description:

This program

public struct SadByteBuffer {
    public init() {}
    public var x: Int = 3

    private func foo() {}

    @inline(never) // commenting out this line stops the allocations from happening
    public mutating func set(staticString string: StaticString, at index: Int) -> Int {
        self.x = string.withUTF8Buffer { ptr -> Int in
            self.foo()
            return 8
        }
        return self.x
    }
}

@inline(never)
func foo(x: StaticString) -> SadByteBuffer {
    var y = SadByteBuffer()
    for _ in 0..<1_000_000 {
        _ = y.set(staticString: x, at: 0)
    }
    return y
}

let y = foo(x: "hello world this is some StaticString")
print(y.x)

in my mind shouldn't allocate for each call to y.set but it does. The self capture within

        self.x = string.withUTF8Buffer { ptr -> Int in
            self.foo()
            return 8
        }

seems to cause it to allocate.

Oddly enough withUTF8Buffer doesn't seem to be inlined into set and I don't really understand why. Interestingly, if you make the SadByteBuffer a class the allocation goes away! @kevints suggested it might be an artefact of the exclusivity checking which sounds plausible.

Regarding the @inline(never) that I have in there: This is just to simulate a cross-module call which is where I actually saw this. But I don't understand why this needs to be inlined, the set function doesn't use any generics or closures so it should just work fine without.

This is the SIL btw:

sil shared [noinline] @function signature specialization <Arg[0] = Exploded, Arg[1] = Dead> of test.SadByteBu
ffer.set(staticString: Swift.StaticString, at: Swift.Int) -> Swift.Int : $@convention(method) (Builtin.Word, 
Builtin.Word, Builtin.Int8, @inout SadByteBuffer) -> Int {
// %0                                             // user: %9
// %1                                             // user: %9
// %2                                             // user: %9
// %3                                             // users: %11, %6, %4
bb0(%0 : $Builtin.Word, %1 : $Builtin.Word, %2 : $Builtin.Int8, %3 : $*SadByteBuffer):
  debug_value_addr %3 : $*SadByteBuffer, var, name "self", argno 3 // id: %4
  // function_ref closure #&#8203;1 in SadByteBuffer.set(staticString:at:)
  %5 = function_ref @closure #&#8203;1 (Swift.UnsafeBufferPointer<Swift.UInt8>) -> Swift.Int in test.SadByteBuffer.set(staticString: Swift.StaticString, at: Swift.Int) -> Swift.Int : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // user: %6
  %6 = partial_apply [callee_guaranteed] %5(%3) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // users: %10, %7
  %7 = convert_escape_to_noescape %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // user: %9
  // function_ref specialized StaticString.withUTF8Buffer<A>(_:)
  %8 = function_ref @function signature specialization <Arg[0] = Exploded> of function signature specialization <Arg[0] = [Closure Propagated : reabstraction thunk helper from @callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@unowned Swift.Int) to @escaping @callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@out Swift.Int), Argument Types : [@callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@unowned Swift.Int)]> of generic specialization <Swift.Int> of Swift.StaticString.withUTF8Buffer<A>((Swift.UnsafeBufferPointer<Swift.UInt8>) -> A) -> A : $@convention(thin) (Builtin.Word, Builtin.Word, Builtin.Int8, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // user: %9
  %9 = apply %8(%0, %1, %2, %7) : $@convention(thin) (Builtin.Word, Builtin.Word, Builtin.Int8, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // users: %13, %12
  strong_release %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // id: %10
  %11 = struct_element_addr %3 : $*SadByteBuffer, #SadByteBuffer.x // user: %12
  store %9 to %11 : $*Int                         // id: %12
  return %9 : $Int                                // id: %13
} // end sil function 'function signature specialization <Arg[0] = Exploded, Arg[1] = Dead> of test.SadByteBuffer.set(staticString: Swift.StaticString, at: Swift.Int) -> Swift.Int'

and here the IR

; Function Attrs: noinline
define linkonce_odr hidden swiftcc i64 @"function signature specialization <Arg[0] = Exploded, Arg[1] = Dead> of test.SadByteBuffer.set(staticString: Swift.StaticString, at: Swift.Int) -> Swift.Int"(i64, i64, i8, %T4test13SadByteBufferV* nocapture swiftself dereferenceable(8)) local_unnamed_addr #&#8203;3 {
entry:
  %4 = tail call noalias %swift.refcounted* @swift_allocObject(%swift.type* getelementptr inbounds (%swift.full_boxmetadata, %swift.full_boxmetadata* @metadata, i64 0, i32 2), i64 24, i64 7) #&#8203;5
  %5 = getelementptr inbounds %swift.refcounted, %swift.refcounted* %4, i64 1
  %6 = bitcast %swift.refcounted* %5 to %T4test13SadByteBufferV**
  store %T4test13SadByteBufferV* %3, %T4test13SadByteBufferV** %6, align 8
  %7 = bitcast %swift.refcounted* %4 to %swift.opaque*
  %8 = tail call swiftcc i64 @"function signature specialization <Arg[0] = Exploded> of function signature specialization <Arg[0] = [Closure Propagated : reabstraction thunk helper from @callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@unowned Swift.Int) to @escaping @callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@out Swift.Int), Argument Types : [@callee_guaranteed (@unowned Swift.UnsafeBufferPointer<Swift.UInt8>) -> (@unowned Swift.Int)]> of generic specialization <Swift.Int> of Swift.StaticString.withUTF8Buffer<A>((Swift.UnsafeBufferPointer<Swift.UInt8>) -> A) -> A"(i64 %0, i64 %1, i8 %2, i8* bitcast (i64 (i64, i64, %swift.refcounted*)* @"partial apply forwarder for closure #&#8203;1 (Swift.UnsafeBufferPointer<Swift.UInt8>) -> Swift.Int in test.SadByteBuffer.set(staticString: Swift.StaticString, at: Swift.Int) -> Swift.Int" to i8*), %swift.opaque* %7)
  tail call void @swift_release(%swift.refcounted* %4) #&#8203;5
  %.x._value = getelementptr inbounds %T4test13SadByteBufferV, %T4test13SadByteBufferV* %3, i64 0, i32 0, i32 0
  store i64 %8, i64* %.x._value, align 8
  ret i64 %8
}

note the

  %4 = tail call noalias %swift.refcounted* @swift_allocObject(%swift.type* getelementptr inbounds (%swift.full_boxmetadata, %swift.full_boxmetadata* @metadata, i64 0, i32 2), i64 24, i64 7) #&#8203;5

this is the test I run to show the number of allocations

swiftc -O test.swift && sudo dtrace -n 'pid$target::malloc:entry { @malloc_calls[ustack()] = count(); } ::END { printa(@malloc_calls); }' -c ./test

which outputs

              libsystem_malloc.dylib`malloc
              libswiftCore.dylib`swift_slowAlloc+0x19
              libswiftCore.dylib`_swift_allocObject_(swift::TargetHeapMetadata<swift::InProcess> const*, unsigned long, unsigned long)+0x14
              test`specialized SadByteBuffersetat(_:_:)+0x2a
              test`specialized testx(_:)+0x3e
              test`main+0x21
              libdyld.dylib`start+0x1
              test`0x1
          1000000

at the end so one allocation there per loop iteration 🙁.

and btw

$ jw-swift-latest swiftc -version
Apple Swift version 4.2-dev (LLVM 44fac26087, Clang 1ecf6042e0, Swift 659e7ceb2d)
Target: x86_64-apple-darwin17.6.0

but behaves the same in 4.1

@AnnaZaks
Copy link
Mannequin

AnnaZaks mannequin commented May 1, 2018

cc @atrick

@atrick
Copy link
Member

atrick commented May 1, 2018

It looks to me like an allocation is needed for the withUTF8Buffer closure. Noescape closure's aren't stack allocated yet, although that's pretty close to being implemented, right aschwaighofer@apple.com (JIRA User)?

This code pattern normally wouldn't be a performance issue, because closure taking withXXX functions are expected to be inlined. For some reason withUTF8Buffer isn't getting inlined, but I don't see any good reason for that.

@weissi
Copy link
Member Author

weissi commented May 1, 2018

@atrick I agree with everything you say. My main question was also why it isn't inlined. Oddly enough, if you remove the @inline(never) the whole thing (including withUTF8Buffer) seems to be inlined into the caller and that doesn't make any sense to me.

@aschwaighofer
Copy link
Member

Close'ish ...

The reason why the allocation for the context (for the partial_apply) goes away with a class is the closure captures self:

bb0(%0 : $Builtin.Word, %1 : $Builtin.Word, %2 : $Builtin.Int8, %3 : $*SadByteBuffer):

  %6 = partial_apply [callee_guaranteed] %5(%3)

If self is a struct we capture the pointer to the struct. Swift needs to allocate a reference counted heap structure for the closure context to hold that pointer so that it can release and retain the context object.

If self is a class reference we capture the class reference. Swift optimizes this case: It can reuse the class reference as the reference counted context instead of allocating a heap object and storing the reference to it.

@weissi
Copy link
Member Author

weissi commented May 1, 2018

aschwaighofer@apple.com (JIRA User) thank you, that makes sense! But shouldn't that go away when `withUTF8Buffer` is inlined? And if yes, why isn't `withUTF8Buffer` inlined?

@atrick
Copy link
Member

atrick commented May 1, 2018

Presumably the inliner is conservative because of code size concerns and lack of @inline(__always) in the stdlib. It's worth having a bug open to investigate.

@weissi
Copy link
Member Author

weissi commented May 1, 2018

@atrick should we just use this as the bug? I'm happy for you to edit the title and body

@aschwaighofer
Copy link
Member

'withUTF8Buffer' is quite big (26 blocks and 180ish instructions) so I would not be surprised if the inliner chose not to inline. However, what I would have expected to be happening is that the closure specializer would create a specialized version of `withUTF8Buffer` with the closure argument inlined. That is also not happening. I am going to take a quick look why.

@weissi
Copy link
Member Author

weissi commented May 1, 2018

aschwaighofer@apple.com (JIRA User) thanks! I ruled out the 'withUTF8Buffer is too big' because if you remove the @inline(never) from public mutating func set, then everything does get inlined. So set and withUTF8Buffer get inlined into foo. And my argument goes: if set + withUTF8Buffer aren't too big, how can withUTF8Buffer alone be too big. But that might just be too naive.

@aschwaighofer
Copy link
Member

The problem is that there is a re-abstraction thunk closure interposed. The closure specializer will remove the thunk. But then it won't run again on the same function:

sil [noinline] @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitF : $@convention(method) (StaticString, Int, @inout SadByteBuffer) -> Int {
// %0                                             // users: %12, %3
// %2                                             // users: %15, %6, %4
bb0(%0 : $StaticString, %1 : $Int, %2 : $*SadByteBuffer):
  debug_value %0 : $StaticString, let, name "string", argno 1 // id: %3
  debug_value_addr %2 : $*SadByteBuffer, var, name "self", argno 3 // id: %4
  // function_ref closure #&#8203;1 in SadByteBuffer.set(staticString:at:)
  %5 = function_ref @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitFSiSRys5UInt8VGXEfU_ : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // user: %6
  %6 = partial_apply [callee_guaranteed] %5(%2) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // users: %14, %7
  %7 = convert_escape_to_noescape %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // user: %9
  // function_ref thunk for @callee_guaranteed (@unowned UnsafeBufferPointer<UInt8>) -> (@unowned Int)
  %8 = function_ref @$SSRys5UInt8VGSiIgyd_ACSiIegyr_TR : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int // user: %9
  %9 = partial_apply [callee_guaranteed] %8(%7) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int // users: %13, %10
  %10 = convert_escape_to_noescape %9 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int // user: %12
  // function_ref specialized StaticString.withUTF8Buffer<A>(_:)
  %11 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg5 : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int // user: %12
  %12 = apply %11(%10, %0) : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int // users: %17, %16
  strong_release %9 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int // id: %13
  strong_release %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // id: %14
  %15 = struct_element_addr %2 : $*SadByteBuffer, #SadByteBuffer.x // user: %16
  store %12 to %15 : $*Int                        // id: %16
  return %12 : $Int                               // id: %17
}
  *** SIL function after  #&#8203;5850, stage ClosureSpecialize, pass 5: ClosureSpecializer (closure-specialize)
// SadByteBuffer.set(staticString:at:)
sil [noinline] @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitF : $@convention(method) (StaticString, Int, @inout SadByteBuffer) -> Int {
// %0                                             // users: %11, %3
// %2                                             // users: %13, %6, %4
bb0(%0 : $StaticString, %1 : $Int, %2 : $*SadByteBuffer):
  debug_value %0 : $StaticString, let, name "string", argno 1 // id: %3
  debug_value_addr %2 : $*SadByteBuffer, var, name "self", argno 3 // id: %4
  // function_ref closure #&#8203;1 in SadByteBuffer.set(staticString:at:)
  %5 = function_ref @$S19TestClosureInlining13SadByteBufferV3set12staticString2atSis06StaticI0V_SitFSiSRys5UInt8VGXEfU_ : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // user: %6
  %6 = partial_apply [callee_guaranteed] %5(%2) : $@convention(thin) (UnsafeBufferPointer<UInt8>, @inout_aliasable SadByteBuffer) -> Int // users: %12, %7
  %7 = convert_escape_to_noescape %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int to $@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // user: %11
  // function_ref thunk for @callee_guaranteed (@unowned UnsafeBufferPointer<UInt8>) -> (@unowned Int)
  %8 = function_ref @$SSRys5UInt8VGSiIgyd_ACSiIegyr_TR : $@convention(thin) (UnsafeBufferPointer<UInt8>, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> @out Int
  // function_ref specialized StaticString.withUTF8Buffer<A>(_:)
  %9 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg507$SSRys5F21VGSiIgyd_ACSiIegyr_TRAFSiIgyd_Tf1cn_n : $@convention(thin) (StaticString, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // user: %11
  // function_ref specialized StaticString.withUTF8Buffer<A>(_:)
  %10 = function_ref @$Ss12StaticStringV14withUTF8BufferyxxSRys5UInt8VGXElFSi_Tg5 : $@convention(method) (@noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> @out Int, StaticString) -> Int
  %11 = apply %9(%0, %7) : $@convention(thin) (StaticString, @noescape @callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int) -> Int // users: %14, %15
  strong_release %6 : $@callee_guaranteed (UnsafeBufferPointer<UInt8>) -> Int // id: %12
  %13 = struct_element_addr %2 : $*SadByteBuffer, #SadByteBuffer.x // user: %14
  store %11 to %13 : $*Int                        // id: %14
  return %11 : $Int                               // id: %15
} 

The closure specializer does not handle partial_apply of partial_apply (it does not concatenate closures) . And neither does it have a special handling for this situation (it could just treat the two partial_applies as a unit and inline them together).

Furthermore, we only run the ClosureOptimizer once in our pipeline so no fix-point effect either.

@weissi
Copy link
Member Author

weissi commented May 3, 2018

Thanks aschwaighofer@apple.com (JIRA User)

@aschwaighofer
Copy link
Member

#16752

@weissi
Copy link
Member Author

weissi commented May 21, 2018

👍 nice one, thanks aschwaighofer@apple.com (JIRA User)

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself
Projects
None yet
Development

No branches or pull requests

3 participants