New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-10683] unsafely getting Swift to trust me that it doesn't need to ARC #53081
Comments
CC @eeckstein/aschwaighofer@apple.com (JIRA User)/@gottesmm |
@swift-ci create |
Taking a quick look at this. |
I can fix this without unsafe guaranteed with semantic arc that is on master. Let me cook something up real quick. |
Also, can you add this as a benchmark to the swift benchmark suite? |
So some interesting stuff. I can get this to work, but I would need to make some changes to SemanticARCOpts that would be slightly larger than I want to make to it right now (even though I can do it, it would just take a little longer). Part of the problem you are running into is that in your API you are not passing Unmanaged things. So to pass it, the compiler will want a safe +0 value, meaning it will insert a copy. If you do like what I did below, you get the behavior you are looking for. That being said, I wonder if we can use a property wrapper to hide the withUnsafeGuaranteed. // MODULE: Framework
import Foundation
public protocol EventHandler: class {
func event(holder: Unmanaged<Holder>)
}
extension EventHandler {
public func event(holder: Unmanaged<Holder>) {
holder._withUnsafeGuaranteedRef { $0.fireEvent() }
}
}
public final class Pipeline {
var head: Holder? = nil
public init() {}
public func addHandler(_ handler: EventHandler) {
if self.head == nil {
self.head = Holder(handler)
return
}
var node = self.head
while node?.next != nil {
node = node?.next
}
node?.nextSafe = Holder(handler)
node?.next = node?.nextSafe
}
public func fireEvent() {
self.head!.invokeEvent()
}
}
public final class Holder {
unowned(unsafe) var next: Holder!
var nextSafe: Holder?
let node: EventHandler
private var unsafeSelf: Unmanaged<Holder> {
return Unmanaged<Holder>.passUnretained(self)
}
init(_ node: EventHandler) {
self.next = nil
self.node = node
}
func invokeEvent() {
self.node.event(holder: unsafeSelf)
}
@inline(never)
public func fireEvent() {
#if FAST
Unmanaged<Holder>.passUnretained(self.next!)._withUnsafeGuaranteedRef { $0.invokeEvent() }
#else
self.unsafeSelf._withUnsafeGuaranteedRef { $0.invokeEvent() }
#endif
}
}
// MODULE: Middleware
//import Framework
public final class NoOpHandler: EventHandler {
public init() {}
}
public final class ConsumingHandler: EventHandler {
var consumed = 0
public init() {}
public func event(holder: Holder) {
self.consumed += 1
}
}
// MODULE: Main
//import Middleware
//import Framework
@inline(never)
func runBench() {
let pipeline = Pipeline()
for _ in 0..<5 {
pipeline.addHandler(NoOpHandler())
}
pipeline.addHandler(ConsumingHandler())
for _ in 0 ..< 30_000_000 {
pipeline.fireEvent()
}
}
runBench() |
Another thing to consider here is that since you made the underling type unsafe guaranteed, I can't use borrows. |
Actually. I figured out how to do this. Took me a second. |
@gottesmm Thanks so much for looking into this, that sounds really promising. I had added the base case (no unmanage/unsafeGuaranteedRef) to the benchmark suite (https://github.com/apple/swift/pull/24765/files). That base case IMHO can't be optimised because the compiler won't be able to proof that I haven't actually added any of the Unmanaged/_withUnsafeGuaranteedRef yet because I didn't know what the "preferred" spelling in actual Swift will be that tell the compiler 'this variable won't be mutated during a call on it' actually is. @gottesmm does that make sense? Is |
That actually is not an issue since ossa can express that property and optimize even so. I have a patch ready that fixes this for ossa. I am just adding some tests. |
For my memory. I think this is fixed by the forwarding ossa optimization that I need to finish. |
I landed the forwarding optimizations on master some time ago. Lets see if the problem goes away. |
Looked at this a bit more. We now are just hitting the strong_retain_* from the use of unsafe (unowned). But beyond that all of the ARC traffic is gone. That being said, I do think we can get rid of all of the ARC traffic with a small source change and a few other changes I have in the pipeline. Consider the following rewrite: public final class Holder {
var next: Holder!
var nextSafe: Holder?
let node: EventHandler
init(_ node: EventHandler) {
self.next = nil
self.node = node
}
func invokeEvent() {
self.node.event(holder: self)
}
@inline(never)
public func fireEvent() {
self.next!.invokeEvent()
var this = self
@_transparent
func inoutWrapper(_ h: inout Holder!) {
h!.invokeEvent()
}
inoutWrapper(&this.next)
}
} The key thing is to note is that I eliminated the usage of unsafe(unowned) to prevent the copy_unmanaged_value from getting emitted (with its semantics of having a retain that can not be eliminated by the optimizer). The other thing to note is that the inoutWrapper function forces SILGen to open an access scope over the entire switch instead of just over the load of the value to be switched on. E.x.: %0 = begin_access
%1 = load [copy] %0 // <---- Can not eliminate this load [copy] b/c of the end_access
end_access %0
switch_enum %1 Until we get true +0 return values and the ability to express "this access scope needs to be opened over multiple statements", we can not optimize the code in this form. That being said, we can use a transparent helper function that takes the ivar inout to expand the access scope of the switch_enum. Example: @inline(never)
public func fireEvent() {
var this = self
@_transparent
func inoutWrapper(_ h: inout Holder!) {
h!.invokeEvent()
}
inoutWrapper(&this.next)
} In this case, we get the lvalue end_access over the entirety of inoutWrapper. This yields the following SIL when we hit SemanticARCOpts with near ToT SIL: // Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0 // users: %3, %1
bb0(%0 : @guaranteed $Holder):
debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
%2 = alloc_stack $Holder, var, name "this" // users: %5, %19, %4
%3 = copy_value %0 : $Holder // user: %4
store %3 to [init] %2 : $*Holder // id: %4
%5 = load [take] %2 : $*Holder // users: %18, %6
%6 = begin_borrow %5 : $Holder // users: %17, %7
%7 = ref_element_addr %6 : $Holder, #Holder.next // user: %8
%8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
%9 = load [copy] %8 : $*Optional<Holder> // user: %10
switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10
// %11 // users: %14, %13
bb1(%11 : @owned $Holder): // Preds: bb0
// function_ref Holder.invokeEvent()
%12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
%13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
destroy_value %11 : $Holder // id: %14
%15 = tuple ()
end_access %8 : $*Optional<Holder> // id: %16
end_borrow %6 : $Holder // id: %17
destroy_value %5 : $Holder // id: %18
dealloc_stack %2 : $*Holder // id: %19
%20 = tuple () // user: %21
return %20 : $() // id: %21
bb2: // Preds: bb0
%22 = integer_literal $Builtin.Word, 14
...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF' This is a form that we can get to zero rr today with a few patches that I have in stream. Specifically: 1. I have a patch that eliminates the alloc_stack in the example above. It comes from teaching temprvalue opts how to eliminate read only allocations that are initialized by a store. Today we only support copy_addr. That will change the above SIL to the following: // Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0 // users: %3, %1
bb0(%0 : @guaranteed $Holder):
debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
%5 = copy_value %0 : $Holder // user: %4
%6 = begin_borrow %5 : $Holder // users: %17, %7
%7 = ref_element_addr %6 : $Holder, #Holder.next // user: %8
%8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
%9 = load [copy] %8 : $*Optional<Holder> // user: %10
switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10
// %11 // users: %14, %13
bb1(%11 : @owned $Holder): // Preds: bb0
// function_ref Holder.invokeEvent()
%12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
%13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
destroy_value %11 : $Holder // id: %14
%15 = tuple ()
end_access %8 : $*Optional<Holder> // id: %16
end_borrow %6 : $Holder // id: %17
destroy_value %5 : $Holder // id: %18
%20 = tuple () // user: %21
return %20 : $() // id: %21
bb2: // Preds: bb0
%22 = integer_literal $Builtin.Word, 14
...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF' That will then allow for us to eliminate the copy_value of %5. Yielding: // Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0 // users: %3, %1
bb0(%0 : @guaranteed $Holder):
debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
%7 = ref_element_addr %0 : $Holder, #Holder.next // user: %8
%8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
%9 = load [copy] %8 : $*Optional<Holder> // user: %10
switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10
// %11 // users: %14, %13
bb1(%11 : @owned $Holder): // Preds: bb0
// function_ref Holder.invokeEvent()
%12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
%13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
destroy_value %11 : $Holder // id: %14
%15 = tuple ()
end_access %8 : $*Optional<Holder> // id: %16
%20 = tuple () // user: %21
return %20 : $() // id: %21
bb2: // Preds: bb0
%22 = integer_literal $Builtin.Word, 14
...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF' 2. Finally, we need to generalize my patch that proves that we can promote inout parameters that are never mutating to also support mutating begin_access that are never actually written to. Should be relatively simple. Once we have that, we should be able to eliminate the copy_value above, yielding what we want: // Holder.fireEvent()
sil [noinline] [ossa] @$s7cp_perf6HolderC9fireEventyyF : $@convention(method) (@guaranteed Holder) -> () {
// %0 // users: %3, %1
bb0(%0 : @guaranteed $Holder):
debug_value %0 : $Holder, let, name "self", argno 1 // id: %1
%7 = ref_element_addr %0 : $Holder, #Holder.next // user: %8
%8 = begin_access [modify] [dynamic] %7 : $*Optional<Holder> // users: %9, %16
%9 = load_borrow %8 : $*Optional<Holder> // user: %10
switch_enum %9 : $Optional<Holder>, case #Optional.some!enumelt.1: bb1, case #Optional.none!enumelt: bb2 // id: %10
// %11 // users: %14, %13
bb1(%11 : @guaranteed $Holder): // Preds: bb0
// function_ref Holder.invokeEvent()
%12 = function_ref @$s7cp_perf6HolderC11invokeEventyyF : $@convention(method) (@guaranteed Holder) -> () // user: %13
%13 = apply %12(%11) : $@convention(method) (@guaranteed Holder) -> ()
end_borrow %11 : $Holder // id: %14
%15 = tuple ()
end_access %8 : $*Optional<Holder> // id: %16
%20 = tuple () // user: %21
return %20 : $() // id: %21
bb2: // Preds: bb0
%22 = integer_literal $Builtin.Word, 14
...
} // end sil function '$s7cp_perf6HolderC9fireEventyyF' |
PR #30225 takes care of part 1. So now all I would need to do is the inout thing I mentioned. Then you can use the pattern I mentioned there. |
That's awesome. Thanks very much @gottesmm, I filed apple/swift-nio#1435 to look into porting this change into the actual NIO codebase 🙂 |
Did the last piece of work here: |
So JW you should be good to go! |
Thanks @gottesmm, this can be closed. If I come across more complicated cases where that I still can’t fix, I’ll file new issues |
Reporter confirmed issue is resolved. |
Attachment: Download
Additional Detail from JIRA
md5: f50041869021cf418f7c6f49e80278d5
Issue Description:
I'm currently working on getting SwiftNIO's ChannelPipeline as fast as Netty's. Obviously, the JVM is garbage-collected so certain micro benchmarks naturally suit that sort of language quite well and one of them is repeatedly accessing pointers that are not changed very often. The GC won't actually have much work to do because there's not much garbage produced so the program can happily chase the pointers without every writing to memory. Swift uses ARC and ARC does (through retain/release) come with constant, very small overheads so in simple pointer chasing benchmarks, ARC will likely look worse than a GC'd language.
Right now, SwiftNIO's ChannelPipeline is about 6x slower than Netty's and that's pretty much exclusively due to ref count operations done whilst walking the pipeline (which happens all the time). Luckily, there's a simple fix but it involves
_withUnsafeGuaranteedRef
and I'm not happy with that.Please find attached a program that is a pretty good approximation of SwiftNIO's
ChannelPipeline
and it demonstrates the problem very well.Naturally, ARC is correct and the retain of
self.next
is necessary becauseself.next
could be reassigned during theself.next!.invokeEvent()
call.However, we could make a very simple change in SwiftNIO that would guarantee that this could never happen but obviously this will be hard to prove to the compiler. I tried expressing this with
unowned(unsafe)
but still, the compiler doesn't trust me that it doesn't need to perform ARC here...I would like a way to express this but apart from
_withUnsafeGuaranteedRef
I don't think Swift has anything like that that isn't underscored...repro
Download the attached program:
run
and then
The interesting thing is that the second version pretty much exactly matches Netty's speed 🙂
The text was updated successfully, but these errors were encountered: