Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-11284] Possible compiler optimization: turning off retain/release for immortal class instances #53685

Open
matt-curtis opened this issue Aug 9, 2019 · 11 comments
Labels
compiler The Swift compiler in itself improvement

Comments

@matt-curtis
Copy link

Previous ID SR-11284
Radar None
Original Reporter @matt-curtis
Type Improvement
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Improvement
Assignee None
Priority Medium

md5: cd9f5e51a970cc328b625c01b49c04c8

Issue Description:

In order to support Automatic Reference Counting, Swift's compiler currently automatically inserts retain/release calls that increment and deincrement an instance's retain count, both of which incur their own performance penalties. However, some instances can be effectively considered "immortal", and expected to live throughout a program's execution, and could possibly be exempt from this requirement. One such example is top-level constants:

// SomeCode.swift

class MyClass { }

let myInstance = MyClass()

Were I writing a loop over several of these (pretend for the sake of demonstration that it's more than 3 instances), there's now a completely unneccessary retain/release dance on every single iteration...

// SomeCode.swift

class MyClass {

    // some properties...

}

let a = MyClass()
let b = MyClass()
let c = MyClass()

for instance in [ a, b, c ] {
    //  automatic retain of instance

    //  do something with instance

    //  automatic release of instance
}

...making it much slower than a parallel implementation using a struct instead:

// SomeCode.swift

struct MyStruct {

    // some properties...

}

let a = MyStruct()
let b = MyStruct()
let c = MyStruct()

for value in [ a, b, c ] {
    //  do something with value
}
@belkadan
Copy link
Contributor

cc @gottesmm

@gottesmm
Copy link
Member

A few things:

1. I assume that MyStruct is meant to be a struct with only trivial data in it (in the example it says class).

2. Semantically you are using top level globals. These have more restrictions in general (since they are globals) around what we can optimize. For test cases like this, it makes more sense to put it inside a single function [I would argue that the top level code issue w/globals is a programming model issue where the programming model is confusing to people).

3. Semantically, when you form that array you are storing a, b, c into a new array meaning the array needs to take ownership of a, b, c meaning a retain count is needed unless we completely eliminate the array and unroll the array.

4. That being said, I wonder if there is something down the line we could do in this case. Specifically, what you really want here is some sort of iterator that doesn't actually take ownership of the underlying values. The natural way to express this is IMO a +0 return value from a coroutine like thing. But I am not sure how to express that in the language today.

@matt-curtis
Copy link
Author

In reference to 3 — for the sake of the loop, could Swift wrap those instances in the equivalent of:

struct ObjectReference {

unowned let instance = someGlobal

}

As a sidenote, Swift having the ability to detect instances like these would also possibly make it possible to detect for-in loops over immutable immortals and turn:

for object in [ a, b, c ] {
something(with: object)
}

into:

something(with: a)
something(with: b)
something(with: c)

This only matters in that currently the first example is much slower than the second, especially in performance-sensitive code.

@gottesmm
Copy link
Member

@matt-curtis, I will look at your response later, but was I correct about the typo in your example?

@matt-curtis
Copy link
Author

About my accidentally declaring what was supposed to be a struct as a class — yes, that was a typo. I've edited and fixed it.

@gottesmm
Copy link
Member

A few more thoughts:

1. I think whatever would be needed to make this work consistently would require some sort of language design or a different way of holding the language. We have ways of returning +0 return values via coroutines and I think that there may be ways to get that method to work.

2. Another way of course is I think we do have some notion of an immortal bit and we /could/ set that on let objects (potentially). That would still have the overhead of the call, but we would be able to bail early.

3. We do unroll like you mentioned in certain cases like this to hit this (I just think it is a hacky way to work around it). I imagine since you are using a global and doing this at global scope you are missing that optimization.

@gottesmm
Copy link
Member

For instance, look at the output from this swift code:

final class Klass {
   @inline(never) func doSomething() {
       print("yes")
   }
}

func foo(a: Klass, b: Klass, c: Klass) {
for instance in [ a, b, c ] {
   instance.doSomething()
}
}

on godbolt. You will see we eliminate the array completely

@gottesmm
Copy link
Member

In fact, even if I do that with globals it works. This Swift:

final class Klass {
   @inline(never) func doSomething() {
       print("yes")
   }
}

let a = Klass()
let b = Klass()
let c = Klass()

func foo() {
for instance in [ a, b, c ] {
   instance.doSomething()
}
}

gets converted to the following sil:

sil hidden @$s6output3fooyyF : $@convention(thin) () -> () {
bb0:
  %0 = global_addr @$s6output1cAA5KlassCvp : $*Klass // user: %5
  %1 = global_addr @$s6output1bAA5KlassCvp : $*Klass // user: %4
  %2 = global_addr @$s6output1aAA5KlassCvp : $*Klass // user: %3
  %3 = load %2 : $*Klass                          // users: %16, %8, %6
  %4 = load %1 : $*Klass                          // users: %17, %9, %12
  %5 = load %0 : $*Klass                          // users: %18, %10, %14
  debug_value %3 : $Klass, let, name "instance"   // id: %6
  %7 = function_ref @$s6output5KlassC11doSomethingyyFTf4d_n : $@convention(thin) () -> () // users: %15, %13, %11
  strong_retain %3 : $Klass                       // id: %8
  strong_retain %4 : $Klass                       // id: %9
  strong_retain %5 : $Klass                       // id: %10
  %11 = apply %7() : $@convention(thin) () -> ()
  debug_value %4 : $Klass, let, name "instance"   // id: %12
  %13 = apply %7() : $@convention(thin) () -> ()
  debug_value %5 : $Klass, let, name "instance"   // id: %14
  %15 = apply %7() : $@convention(thin) () -> ()
  strong_release %3 : $Klass                      // id: %16
  strong_release %4 : $Klass                      // id: %17
  strong_release %5 : $Klass                      // id: %18
  %19 = tuple ()                                  // user: %20
  return %19 : $()                                // id: %20
} // end sil function '$s6output3fooyyF'

@gottesmm
Copy link
Member

NOTE: There is a missing semantic arc optimization here that would let us get rid of even more of this overhead. I expect with time (in a future release) I can hit that.

@gottesmm
Copy link
Member

Looking at the actual case with the for loop in the global level, the SIL looks slightly different, but overall same thing. The big issue is going to be about when we can move semantic sil back far enough to hit this.

@matt-curtis
Copy link
Author

Hm, I'll have to do a comparison when I get the chance. In my tests the loop was slower than the manual equivalent, even with an array of structs

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler The Swift compiler in itself improvement
Projects
None yet
Development

No branches or pull requests

3 participants