Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-5021] Closure not eliminated #47597

Open
dabrahams opened this issue May 25, 2017 · 9 comments
Open

[SR-5021] Closure not eliminated #47597

dabrahams opened this issue May 25, 2017 · 9 comments
Assignees
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance

Comments

@dabrahams
Copy link
Collaborator

Previous ID SR-5021
Radar rdar://problem/32410717
Original Reporter @dabrahams
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Performance
Assignee @swiftix
Priority Medium

md5: 608abe7f30675ed1c18101e545e81efa

Issue Description:

The last commit of https://github.com/dabrahams/swift/tree/perf-mystery
produces a big slowdown in the StringMatch benchmark because (we think) closures getting passed to UTF8.encode(_:into: ) (not UTF8.encode(_: )) are not getting eliminated String.CharacterView.subscript.

Here's an excerpt of the generated SIL for the standard library, where the specialization lives.

bb136:                                            // Preds: bb134
  %661 = unchecked_enum_data %657 : $Optional<Unicode.Scalar>, #Optional.some!enumelt.1 // user: %672
  %662 = alloc_stack $UInt64                      // users: %688, %683, %679, %669, %665
  %663 = integer_literal $Builtin.Int64, 0        // user: %664
  %664 = struct $UInt64 (%663 : $Builtin.Int64)   // users: %667, %665
  store %664 to %662 : $*UInt64                   // id: %665
  %666 = alloc_stack $UInt64                      // users: %687, %675, %669, %667
  store %664 to %666 : $*UInt64                   // id: %667
  // function_ref specialized closure #&#8203;1 in Character.init(_:)
  %668 = function_ref @function signature specialization <preserving fragile attribute, Arg[1] = Stack Promoted from Box, Arg[2] = Stack Promoted from Box> of closure #&#8203;1 (Swift.UInt8) -> () in Swift.Character.init(Swift.Unicode.Scalar) -> Swift.Character : $@convention(thin) (UInt8, @inout_aliasable UInt64, @inout_aliasable UInt64) -> () // user: %669
  %669 = partial_apply %668(%662, %666) : $@convention(thin) (UInt8, @inout_aliasable UInt64, @inout_aliasable UInt64) -> () // users: %684, %673, %672, %671
  // function_ref specialized static Unicode.UTF8.encode(_:into:)
  %670 = function_ref @function signature specialization <preserving fragile attribute, Arg[1] = Owned To Guaranteed, Arg[2] = Dead> of static Swift.Unicode.UTF8.encode(Swift.Unicode.Scalar, into: (Swift.UInt8) -> ()) -> () : $@convention(thin) (Unicode.Scalar, @guaranteed @callee_owned (UInt8) -> ()) -> () // user: %672
  strong_retain %669 : $@callee_owned (UInt8) -> () // id: %671
  %672 = apply %670(%661, %669) : $@convention(thin) (Unicode.Scalar, @guaranteed @callee_owned (UInt8) -> ()) -> ()
  strong_release %669 : $@callee_owned (UInt8) -> () // id: %673
  %674 = integer_literal $Builtin.Int64, -1       // user: %678
  %675 = struct_element_addr %666 : $*UInt64, #UInt64._value // user: %676
  %676 = load %675 : $*Builtin.Int64              // user: %677
  %677 = builtin "and_Int64"(%676 : $Builtin.Int64, %563 : $Builtin.Int64) : $Builtin.Int64 // user: %678
  %678 = builtin "shl_Int64"(%674 : $Builtin.Int64, %677 : $Builtin.Int64) : $Builtin.Int64 // user: %681
  %679 = struct_element_addr %662 : $*UInt64, #UInt64._value // user: %680
  %680 = load %679 : $*Builtin.Int64              // user: %681
  %681 = builtin "or_Int64"(%680 : $Builtin.Int64, %678 : $Builtin.Int64) : $Builtin.Int64 // users: %685, %682
  %682 = struct $UInt64 (%681 : $Builtin.Int64)   // user: %683
  store %682 to %662 : $*UInt64                   // id: %683
  strong_release %669 : $@callee_owned (UInt8) -> () // id: %684
  %685 = builtin "trunc_Int64_Int63"(%681 : $Builtin.Int64) : $Builtin.Int63 // user: %686
  %686 = enum $Character.Representation, #Character.Representation.small!enumelt.1, %685 : $Builtin.Int63 // user: %689
  dealloc_stack %666 : $*UInt64                   // id: %687
  dealloc_stack %662 : $*UInt64                   // id: %688
  %689 = struct $Character (%686 : $Character.Representation) // user: %690
  br bb166(%689 : $Character)                     // id: %690
@dabrahams
Copy link
Collaborator Author

@swift-ci create

@aschwaighofer
Copy link
Member

Today, the callee receiving the closure needed to be marked inline_always for this closure to be eliminated (presumably this is what happened in the “before Dave’s changes” state).

The closure specializer could/should handle this:

%1 = alloc_stack
%2 = alloc_stack
%closure = partial_apply %closee(%1, %2)
apply %fun(…, %closure)

=> 

%1 = alloc_stack
%2 = alloc_stack
apply %fun_with_closee(.., %1, %2)

fun_with_closee () {
bb(%1, %2):
  partial_apply closuee(%1, %2)
}

But currently it does not seem to do this.

We can workaround by marking %fun with always_inline (confirmed that this works) or by expressing the code without the closure.

When non-escaping closure become stack allocated the overhead is also going to be much less even if optimization fail.

@swiftix
Copy link
Mannequin

swiftix mannequin commented May 30, 2017

@dabrahams Dave, I merged the PR #9987 to support your use-case. Could you check if it solves the performance issue for your. This PR should enable specialization of closures that take @inout address-arguments (i.e. they mutate the captured vars or e.g. capture an existential) and deliver a much better performance for them. But may be it does not catch all of the test-cases you are interested in.

@swiftix
Copy link
Mannequin

swiftix mannequin commented May 31, 2017

I checked that closures are specialized now and partial_applies are eliminated. But it does not improve the performance of StringMatch

@dabrahams
Copy link
Collaborator Author

That's interesting, because the workaround Arnold gave me, marking the function as @inline(__always), does appear to recover all the lost performance.

@swiftix
Copy link
Mannequin

swiftix mannequin commented May 31, 2017

@dabrahams Which function specifically was marked as @inline(__always)?

I see an issue with this function:

// specialized String.CharacterView.subscript.getter
sil public_external [serialized] @_T0SS13CharacterViewV9subscriptABs5RangeVySS5IndexVGcfgTfq4xn_n : $@convention(method) (Builtin.Int64, Builtin.Int64, @guaranteed String.CharacterView) -> @owned String.CharacterView

This function is very big in SIL and is ... self-recursive on the perf-mistery branch. As a result it is not inlined.

@dabrahams
Copy link
Collaborator Author

UTF8.encode(_:into: ) was marked as @inline(__always)

self-recursive, really!?

@swiftix
Copy link
Mannequin

swiftix mannequin commented May 31, 2017

OK. I checked again. It is not literally self-recursive.

Inside the specialization
function signature specialization <preserving fragile attribute, Arg[0] = Exploded> of Swift.String.CharacterView.subscript.getter : (Swift.String.Index) -> Swift.Character

there is an invocation of a different specialization of the same function but with different generic parameters:
function signature specialization <preserving fragile attribute, Arg[0] = Exploded> of Swift.String.CharacterView.subscript.getter : (Swift.Range<Swift.String.Index>) -> Swift.String.CharacterView

@swiftix
Copy link
Mannequin

swiftix mannequin commented May 31, 2017

I checked again and it seems like my measurements yesterday were bogus. My ClosureSpecializer PR fixes the performance issue!

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance
Projects
None yet
Development

No branches or pull requests

2 participants