Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10346] excessive reference counting, leading to CoW copy for every heap insert in NIO :| #52746

Closed
weissi opened this issue Apr 9, 2019 · 12 comments
Assignees
Labels
ARC Feature: automatic reference counting bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself optimized only Flag: An issue whose reproduction requires optimized compilation

Comments

@weissi
Copy link
Member

weissi commented Apr 9, 2019

Previous ID SR-10346
Radar rdar://problem/52529035
Original Reporter @weissi
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, ARC, OptimizedOnly
Assignee @gottesmm
Priority Medium

md5: c52a6805e968aded3891a4cd5ceb1cf7

relates to:

  • SR-10709 Swift needs allocation counter tests

Issue Description:

@Lukasa & I were just debugging weird allocations in NIO's slightly odd heap allocation. Here what we could reduce it to:

The following code (two modules):

main module

import OtherModule

var heap = Foo<Int>()
for _ in 0..<10000 {
    heap.append(1)
}

OtherModule

// MODULE: OtherModule
public struct Foo<T: Comparable> {
    let comparator: (T, T) -> Bool = (<)
    private var storage: [T] = .init()

    public init() {
        self.storage.reserveCapacity(10000)
    }

    static func realAppend(_ array: inout [T], _ t: T, comparator: (T, T) -> Bool) {
        array.append(t)
        _ = comparator(array[0] , t)
    }

    public mutating func append(_ t: T) {
        // BAD
        Foo.realAppend(&self.storage, t, comparator: self.comparator)

        // GOOD
        // let comparator = self.comparator
        // Foo.realAppend(&self.storage, t, comparator: comparator)
    }
}

should almost not allocate at all. However it allocates twice per loop iteration. It seems to CoW-copy the underlying array twice per iteration.

              libsystem_malloc.dylib`malloc
              libswiftCore.dylib`swift_slowAlloc+0x19
              libswiftCore.dylib`_swift_allocObject_(swift::TargetHeapMetadata<swift::InProcess> const*, unsigned long, unsigned long)+0x14
              libswiftCore.dylib`_ArrayBufferProtocol._forceCreateUniqueMutableBufferImpl(countForBuffer:minNewCapacity:requiredCapacity:)+0xef
              libswiftCore.dylib`_ArrayBufferProtocol._forceCreateUniqueMutableBuffer(countForNewBuffer:minNewCapacity:)+0x15
              libswiftCore.dylib`Array._copyToNewBuffer(oldCount:)+0x7a
              libswiftCore.dylib`Array._makeUniqueAndReserveCapacityIfNotUnique()+0xc7
              TestApp`static Foo.realAppend(_:_:comparator:)+0x5e
              TestApp`Foo.append(_:)+0x4e
              TestApp`main+0x88
              libdyld.dylib`start+0x1
              TestApp`0x1
            10000

              libsystem_malloc.dylib`malloc
              libswiftCore.dylib`swift_slowAlloc+0x19
              libswiftCore.dylib`_swift_allocObject_(swift::TargetHeapMetadata<swift::InProcess> const*, unsigned long, unsigned long)+0x14
              libswiftCore.dylib`default argument 3 of _ArrayBufferProtocol._arrayOutOfPlaceUpdate(_:_:_:_:)+0x23
              libswiftCore.dylib`Array._copyToNewBuffer(oldCount:)+0x94
              libswiftCore.dylib`Array._makeUniqueAndReserveCapacityIfNotUnique()+0xc7
              TestApp`static Foo.realAppend(_:_:comparator:)+0x5e
              TestApp`Foo.append(_:)+0x4e
              TestApp`main+0x88
              libdyld.dylib`start+0x1
              TestApp`0x1
            10000

oddly enough, if you change this line

        Foo.realAppend(&self.storage, t, comparator: self.comparator)

to

        let comparator = self.comparator
        Foo.realAppend(&self.storage, t, comparator: comparator)

it's just fine and it doesn't allocate & cow-copy anymore. The attached screenshot shows the only bit of diff in the SIL between GOOD and BAD.

repro

Unpack the attached tarball and run

swift run -c release && sudo dtrace -n 'pid$target::malloc:entry, pid$target::posix_memalign:entry, pid$target::realloc:entry, pid$target::reallocf:entry, pid$target::calloc:entry, pid$target::valloc:entry, pid$target::posix_memalign:entry { @malloc_calls[ustack()] = count(); } ::END { printa(@malloc_calls); }' -c .build/release/TestApp

other info

the relevant SIL

Foo.append

// Foo.append(_:)
sil @OtherModule.Foo.append(A) -> () : $@convention(method) <T where T : Comparable> (@in_guaranteed T, @inout Foo<T>) -> () {
// %0                                             // users: %16, %2
// %1                                             // users: %7, %5, %3
bb0(%0 : $*T, %1 : $*Foo<T>):
  debug_value_addr %0 : $*T, let, name "t", argno 1 // id: %2
  debug_value_addr %1 : $*Foo<T>, var, name "self", argno 2 // id: %3
  %4 = metatype $@thin Foo<T>.Type                // user: %16
  %5 = struct_element_addr %1 : $*Foo<T>, #Foo.comparator // user: %6
  %6 = load %5 : $*@callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool // users: %17, %14, %12
  %7 = struct_element_addr %1 : $*Foo<T>, #Foo.storage // users: %16, %8
  %8 = struct_element_addr %7 : $*Array<T>, #Array._buffer // user: %9
  %9 = struct_element_addr %8 : $*_ArrayBuffer<T>, #_ArrayBuffer._storage // user: %10
  %10 = struct_element_addr %9 : $*_BridgeStorage<__ContiguousArrayStorageBase>, #_BridgeStorage.rawValue // user: %11
  %11 = load %10 : $*Builtin.BridgeObject         // users: %18, %15
  %12 = convert_escape_to_noescape %6 : $@callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool to $@noescape @callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool // user: %16
  // function_ref static Foo.realAppend(_:_:comparator:)
  %13 = function_ref @static OtherModule.Foo.realAppend(_: inout [A], _: A, comparator: (A, A) -> Swift.Bool) -> () : $@convention(method) <τ_0_0 where τ_0_0 : Comparable> (@inout Array<τ_0_0>, @in_guaranteed τ_0_0, @noescape @callee_guaranteed (@in_guaranteed τ_0_0, @in_guaranteed τ_0_0) -> Bool, @thin Foo<τ_0_0>.Type) -> () // user: %16
  strong_retain %6 : $@callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool // id: %14
  strong_retain %11 : $Builtin.BridgeObject       // id: %15
  %16 = apply %13<T>(%7, %0, %12, %4) : $@convention(method) <τ_0_0 where τ_0_0 : Comparable> (@inout Array<τ_0_0>, @in_guaranteed τ_0_0, @noescape @callee_guaranteed (@in_guaranteed τ_0_0, @in_guaranteed τ_0_0) -> Bool, @thin Foo<τ_0_0>.Type) -> ()
  strong_release %6 : $@callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool // id: %17
  strong_release %11 : $Builtin.BridgeObject      // id: %18
  %19 = tuple ()                                  // user: %20
  return %19 : $()                                // id: %20
} // end sil function 'OtherModule.Foo.append(A) -> ()'

Foo.realAppend

// static Foo.realAppend(_:_:comparator:)
sil hidden @static OtherModule.Foo.realAppend(_: inout [A], _: A, comparator: (A, A) -> Swift.Bool) -> () : $@convention(method) <T where T : Comparable> (@inout Array<T>, @in_guaranteed T, @noescape @callee_guaranteed
 (@in_guaranteed T, @in_guaranteed T) -> Bool, @thin Foo<T>.Type) -> () {
// %0                                             // users: %10, %15, %4
// %1                                             // users: %19, %8, %5
// %2                                             // users: %19, %6
bb0(%0 : $*Array<T>, %1 : $*T, %2 : $@noescape @callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool, %3 : $@thin Foo<T>.Type):
  debug_value_addr %0 : $*Array<T>, var, name "array", argno 1 // id: %4
  debug_value_addr %1 : $*T, let, name "t", argno 2 // id: %5
  debug_value %2 : $@noescape @callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool, let, name "comparator", argno 3 // id: %6
  %7 = alloc_stack $T                             // users: %11, %10, %12, %8
  copy_addr %1 to [initialization] %7 : $*T       // id: %8
  // function_ref specialized Array.append(_:)
  %9 = function_ref @function signature specialization <Arg[0] = Owned To Guaranteed> of Swift.Array.append(__owned A) -> () : $@convention(method) <τ_0_0> (@in_guaranteed τ_0_0, @inout Array<τ_0_0>) -> () // user: %10
  %10 = apply %9<T>(%7, %0) : $@convention(method) <τ_0_0> (@in_guaranteed τ_0_0, @inout Array<τ_0_0>) -> ()
  destroy_addr %7 : $*T                           // id: %11
  dealloc_stack %7 : $*T                          // id: %12
  %13 = integer_literal $Builtin.Int64, 0         // user: %14
  %14 = struct $Int (%13 : $Builtin.Int64)        // user: %18
  %15 = load %0 : $*Array<T>                      // user: %18
  %16 = alloc_stack $T                            // users: %21, %20, %19, %18
  // function_ref Array.subscript.getter
  %17 = function_ref @Swift.Array.subscript.getter : (Swift.Int) -> A : $@convention(method) <τ_0_0> (Int, @guaranteed Array<τ_0_0>) -> @out τ_0_0 // user: %18
  %18 = apply %17<T>(%16, %14, %15) : $@convention(method) <τ_0_0> (Int, @guaranteed Array<τ_0_0>) -> @out τ_0_0
  %19 = apply %2(%16, %1) : $@noescape @callee_guaranteed (@in_guaranteed T, @in_guaranteed T) -> Bool
  destroy_addr %16 : $*T                          // id: %20
  dealloc_stack %16 : $*T                         // id: %21
  %22 = tuple ()                                  // user: %23
  return %22 : $()                                // id: %23
} // end sil function 'static OtherModule.Foo.realAppend(_: inout [A], _: A, comparator: (A, A) -> Swift.Bool) -> ()'

so the problem is that BAD does an extra, unnecessary retain/release on BridgeObject

@weissi
Copy link
Member Author

weissi commented Apr 9, 2019

Also, turns out there's a nice Swift 5.0 regression in there as well:

johannes:/tmp/repro
$ jw-swift-4.2 make-single-file-spm /tmp/test.swift mallocs
all extra modules: OtherModule
SwiftPM package in:
  /tmp/.gen-swift-perf-test_qzI3Jb
  /tmp/.gen-swift-perf-test_qzI3Jb/.build/release/TestApp
Compile Swift Module 'OtherModule' (1 sources)
Compile Swift Module 'TestApp' (1 sources)
Linking ./.build/x86_64-apple-macosx10.10/release/TestApp
dtrace: description 'pid$target::malloc:entry ' matched 5 probes
dtrace: pid 76753 has exited
CPU     ID                    FUNCTION:NAME
  2      2                             :END 
            10029


johannes:/tmp/repro
$ jw-swift-5.0 make-single-file-spm /tmp/test.swift mallocs
all extra modules: OtherModule
SwiftPM package in:
  /tmp/.gen-swift-perf-test_31vesz
  /tmp/.gen-swift-perf-test_31vesz/.build/release/TestApp
[3/3] Linking ./.build/x86_64-apple-macosx/release/TestApp
dtrace: description 'pid$target::malloc:entry ' matched 5 probes
dtrace: pid 76828 has exited
CPU     ID                    FUNCTION:NAME
  2      2                             :END 
            20100


johannes:/tmp/repro
$ jw-swift-4.2 swift -version
Apple Swift version 4.2 (swift-4.2-RELEASE)
Target: x86_64-apple-darwin18.5.0
johannes:/tmp/repro
$ jw-swift-5.0 swift -version
Apple Swift version 5.0 (swift-5.0-RELEASE)
Target: x86_64-apple-darwin18.5.0
johannes:/tmp/repro

now the whole thing should pretty much not allocate at all, but Swift 4.2 does 10k allocs and Swift 5.0 does 20k allocs, not cool 😉

@weissi
Copy link
Member Author

weissi commented Apr 9, 2019

so, I could pin-point the Swift 5 regression a little more: If we remove all the generics from the above program and make it

import OtherModule

var heap = Foo()
for _ in 0..<10000 {
    heap.append(1)
}

// MODULE: OtherModule
public struct Foo {
    let comparator: (Int, Int) -> Bool = (<)
    private var storage: Array<Int> = .init()

    public init() {
        self.storage.reserveCapacity(10000)
    }

    static func realAppend(_ array: inout Array<Int>, _ t: Int, comparator: (Int, Int) -> Bool) {
        array.append(t)
        _ = comparator(array[0] , t)
    }

    public mutating func append(_ t: Int) {
        // BAD
        Foo.realAppend(&self.storage, t, comparator: self.comparator)

        // GOOD
        // let comparator = self.comparator
        // Foo.realAppend(&self.storage, t, comparator: comparator)
    }
}

and then run it, then Swift 4.2 and Swift 5.0 perform pretty much the same:

$ jw-swift-5.0 make-single-file-spm /tmp/test.swift mallocs
all extra modules: OtherModule
SwiftPM package in:
  /tmp/.gen-swift-perf-test_cIdRJF
  /tmp/.gen-swift-perf-test_cIdRJF/.build/release/TestApp
[3/3] Linking ./.build/x86_64-apple-macosx/release/TestApp
dtrace: description 'pid$target::malloc:entry ' matched 5 probes
dtrace: pid 77125 has exited
CPU     ID                    FUNCTION:NAME
  4      2                             :END 
            10044


johannes:/tmp/repro
$ jw-swift-4.2 make-single-file-spm /tmp/test.swift mallocs
all extra modules: OtherModule
SwiftPM package in:
  /tmp/.gen-swift-perf-test_Apx1ja
  /tmp/.gen-swift-perf-test_Apx1ja/.build/release/TestApp
Compile Swift Module 'OtherModule' (1 sources)
Compile Swift Module 'TestApp' (1 sources)
Linking ./.build/x86_64-apple-macosx10.10/release/TestApp
dtrace: description 'pid$target::malloc:entry ' matched 5 probes
dtrace: pid 77205 has exited
CPU     ID                    FUNCTION:NAME
  1      2                             :END 
            10018

@weissi
Copy link
Member Author

weissi commented Apr 9, 2019

CC @eeckstein/@gottesmm

@weissi
Copy link
Member Author

weissi commented Apr 9, 2019

the great news is that it's fixed with the dev-snapshot from 7th April 2019:

$ jw-swift-latest swiftc -version
Apple Swift version 5.0-dev (LLVM 61e235ed1e, Swift e0f79dcba2)
Target: x86_64-apple-darwin18.5.0

$ jw-swift-latest make-single-file-spm /tmp/test.swift mallocs
all extra modules: OtherModule
SwiftPM package in:
  /tmp/.gen-swift-perf-test_CdYMrt
  /tmp/.gen-swift-perf-test_CdYMrt/.build/release/TestApp
[3/3] Linking TestApp
dtrace: description 'pid$target::malloc:entry ' matched 5 probes

dtrace: pid 78216 has exited
CPU     ID                    FUNCTION:NAME
  6      2                             :END 
              121

@belkadan
Copy link
Contributor

belkadan commented Apr 9, 2019

I suspect this is because arguments are evaluated left-to-right, and so it needs to save the old value of self to do self.comparator…except that inout arguments are not supposed to follow that order strictly (IIRC). cc @atrick

@gottesmm
Copy link
Member

gottesmm commented Apr 9, 2019

Just to fill in a bit of context: I am not surprised that on master this is not happening anymore since I enabled semantic arc by default.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 9, 2019

I can confirm that the original repro on the actual NIO repository is also fixed in current master. While this specific repro scenario is fixed by semantic ARC (semantic ARC FTW, this issue by itself justifies the work done on it), it may be nice to investigate whether the compiler could have avoided this issue even without semantic ARC, rather than relying on it.

Totally up to @belkadan though, I'm certainly not going to make you folks do work you don't feel you need to!

@belkadan
Copy link
Contributor

belkadan commented Apr 9, 2019

Not me, I'm deferring to those with more SIL expertise, like Michael, Erik, and Andy.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 9, 2019

Whomever is making the decision, I’m sure that we can all agree that it isn’t me, and be glad of that fact.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 10, 2019

Out of curiosity I grabbed the NIOHTTP2 load tester and ran it with the most recent snapshot (e.g. with semantic ARC). We saw a 10% performance boost with zero code changes on our end, which I'm calling a big win. Looking forward to seeing it land in a productised release!

@weissi
Copy link
Member Author

weissi commented Jul 2, 2019

@swift-ci create

@gottesmm
Copy link
Member

gottesmm commented Jul 8, 2019

I talked with Johannes. He agreed that since this is solved by semantic arc, we can close this.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARC Feature: automatic reference counting bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself optimized only Flag: An issue whose reproduction requires optimized compilation
Projects
None yet
Development

No branches or pull requests

4 participants