Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10604] dictionary modify accessor allocates every single time #53004

Open
weissi opened this issue May 2, 2019 · 7 comments
Open

[SR-10604] dictionary modify accessor allocates every single time #53004

weissi opened this issue May 2, 2019 · 7 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance standard library Area: Standard library umbrella

Comments

@weissi
Copy link
Member

weissi commented May 2, 2019

Previous ID SR-10604
Radar rdar://problem/52529030
Original Reporter @weissi
Type Bug
Additional Detail from JIRA
Votes 1
Component/s Compiler, Standard Library
Labels Bug, Performance
Assignee None
Priority Medium

md5: 4ba5c6b107c61899a4524c1db392517b

relates to:

  • SR-10709 Swift needs allocation counter tests

Issue Description:

@Lukasa just showed me some of his HTTP/2 code that allocates a lot. He saw it allocating in presumably here:

self = .init(dummy: ())

But the whole complexity of the http/2 code isn't actually necessary, even the most simple use of the modify accessor triggers the allocations:

extension Optional {
    mutating func apply(_ body: (Wrapped) -> Wrapped?) {
        guard let real = self else {
            return
        }

        self = nil
        self = body(real)
    }
}

@inline(never)
func repro() {
    var dict: [Int: Int] = [:]
    dict[1] = 1
    for _ in 0..<10_000 {
        dict[1].apply { _ in 1 }
    }
    print(dict)
}

repro()

ran, produces under dtrace that traces every malloc:

              libsystem_malloc.dylib`malloc
              TestApp`specialized _NativeDictionary.subscript.modify+0x24
              TestApp`repro()+0x8c
              TestApp`main+0x9
              libdyld.dylib`start+0x1
              TestApp`0x1
            10000

That means for 10,000 calls to the modify accessor, we allocated 10,000 times :'(.

We can reduce the code further to

extension Optional where Wrapped == Int {
    mutating func apply() {
        self = 1
    }
}

@inline(never)
func repro() {
    var dict: [Int: Int] = [:]
    dict[1] = 1
    for _ in 0..<10_000 {
        dict[1].apply()
    }
    print(dict)
}

repro()
@belkadan
Copy link
Contributor

belkadan commented May 2, 2019

cc @rjmccall, @eeckstein

@weissi
Copy link
Member Author

weissi commented May 2, 2019

   Apple Swift version 5.0-dev (LLVM 082dec2e22, Swift 4e86c8ff52)
Target: x86_64-apple-darwin18.5.0

(same for Swift 5.0.1) ran

jw-swift-latest swift -version && jw-swift-latest swiftc -O test.swift && sudo ~/devel/swift-nio/dev/malloc-aggregation.d -c ./test

@weissi
Copy link
Member Author

weissi commented May 2, 2019

better stack for the allocation:

(lldb) bt
* thread #&#8203;1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #&#8203;0: 0x00007fff6532ab69 libsystem_malloc.dylib`malloc
    frame #&#8203;1: 0x0000000100000e14 test`specialized _NativeDictionary.subscript.modify at <compiler-generated>:0 [opt]
    frame #&#8203;2: 0x0000000100000cf9 test`repro() [inlined] generic specialization <Swift.Int, Swift.Int> of Swift.Dictionary._Variant.subscript.modify : (A) -> Swift.Optional<B> at <compiler-generated>:0 [opt]
    frame #&#8203;3: 0x0000000100000ce0 test`repro() [inlined] generic specialization <Swift.Int, Swift.Int> of Swift.Dictionary.subscript.modify : (A) -> Swift.Optional<B> at <compiler-generated>:0 [opt]
    frame #&#8203;4: 0x0000000100000ce0 test`repro() at test.swift:12 [opt]
    frame #&#8203;5: 0x0000000100000c49 test`main at test.swift:17:1 [opt]
    frame #&#8203;6: 0x00007fff651803d5 libdyld.dylib`start + 1

the assembly confirms that it's allocating unconditionally pretty much before doing anything else

(lldb) frame select 1
test was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #&#8203;1: 0x0000000100000e14 test`specialized _NativeDictionary.subscript.modify at <compiler-generated>:0 [opt]
(lldb) dis
test`specialized _NativeDictionary.subscript.modify:
    0x100000df0 <+0>:   pushq  %rbp
    0x100000df1 <+1>:   movq   %rsp, %rbp
    0x100000df4 <+4>:   pushq  %r15
    0x100000df6 <+6>:   pushq  %r14
    0x100000df8 <+8>:   pushq  %r13
    0x100000dfa <+10>:  pushq  %r12
    0x100000dfc <+12>:  pushq  %rbx
    0x100000dfd <+13>:  subq   $0x18, %rsp
    0x100000e01 <+17>:  movl   %edx, -0x3c(%rbp)
    0x100000e04 <+20>:  movq   %rsi, %r15
    0x100000e07 <+23>:  movq   %rdi, %rbx
    0x100000e0a <+26>:  movl   $0x40, %edi
    0x100000e0f <+31>:  callq  0x100001e82               ; symbol stub for: malloc
->  0x100000e14 <+36>:  movq   %rax, %r14
    0x100000e17 <+39>:  movq   %rax, (%rbx)
    0x100000e1a <+42>:  movq   %r13, 0x8(%rax)
    0x100000e1e <+46>:  movq   %r15, (%rax)
    0x100000e21 <+49>:  movq   %r13, -0x38(%rbp)
    0x100000e25 <+53>:  movq   (%r13), %r12
    0x100000e29 <+57>:  movq   0x28(%r12), %rdi
    0x100000e2e <+62>:  movq   %r15, %rsi
    0x100000e31 <+65>:  callq  0x100001e70               ; symbol stub for: static Swift.Hasher._hash(seed: Swift.Int, _: Swift.UInt64) -> Swift.Int
[...]

The first BB in the SIL looks like this

// specialized _NativeDictionary.subscript.modify
sil shared [always_inline] @$ss17_NativeDictionaryV_8isUniqueq_Sgx_SbtciMSi_SiTg5 : $@yield_once @convention(method) (Int, Bool, @inout _NativeDictionary<Int, Int>) -> @yields @inout Optional<Int> {
// %0                                             // users: %173, %183, %3
// %1                                             // users: %97, %105
// %2                                             // users: %181, %171, %154, %137, %301, %298, %189, %192, %110, %4
bb0(%0 : $Int, %1 : $Bool, %2 : $*_NativeDictionary<Int, Int>):
  %3 = struct_extract %0 : $Int, #Int._value      // users: %52, %232, %273, %315, %9
  %4 = load %2 : $*_NativeDictionary<Int, Int>    // users: %337, %317, %51, %5
  %5 = struct_extract %4 : $_NativeDictionary<Int, Int>, #_NativeDictionary._storage // users: %305, %89, %75, %43, %14, %12, %6
  %6 = ref_element_addr %5 : $__RawDictionaryStorage, #__RawDictionaryStorage._seed // user: %7
  %7 = load %6 : $*Int                            // user: %11
  %8 = metatype $@thin Hasher.Type                // users: %196, %11
  %9 = struct $UInt64 (%3 : $Builtin.Int64)       // users: %196, %11
  // function_ref static Hasher._hash(seed:_:)
  %10 = function_ref @$ss6HasherV5_hash4seed_S2i_s6UInt64VtFZ : $@convention(method) (Int, UInt64, @thin Hasher.Type) -> Int // users: %196, %11
  %11 = apply %10(%7, %9, %8) : $@convention(method) (Int, UInt64, @thin Hasher.Type) -> Int // user: %24
  %12 = ref_tail_addr %5 : $__RawDictionaryStorage, $_UnsafeBitset.Word // users: %62, %326, %32
  %13 = integer_literal $Builtin.Int64, 1         // users: %78, %68, %55, %248, %235, %332, %319, %290, %277, %218, %204, %203, %38, %22, %20
  %14 = ref_element_addr %5 : $__RawDictionaryStorage, #__RawDictionaryStorage._scale // user: %15
  %15 = struct_element_addr %14 : $*Int8, #Int8._value // user: %16
  %16 = load %15 : $*Builtin.Int8                 // user: %17
  %17 = builtin "sextOrBitCast_Int8_Int64"(%16 : $Builtin.Int8) : $Builtin.Int64 // user: %19
  %18 = integer_literal $Builtin.Int64, 63        // users: %67, %247, %331, %289, %217, %202, %37, %19
  %19 = builtin "and_Int64"(%17 : $Builtin.Int64, %18 : $Builtin.Int64) : $Builtin.Int64 // user: %20
  %20 = builtin "shl_Int64"(%13 : $Builtin.Int64, %19 : $Builtin.Int64) : $Builtin.Int64 // user: %22
  %21 = integer_literal $Builtin.Int1, 0          // users: %78, %55, %235, %319, %277, %204, %95, %22
  %22 = builtin "ssub_with_overflow_Int64"(%20 : $Builtin.Int64, %13 : $Builtin.Int64, %21 : $Builtin.Int1) : $(Builtin.Int64, Builtin.Int1) // user: %23
  %23 = tuple_extract %22 : $(Builtin.Int64, Builtin.Int1), 0 // users: %57, %321, %25
  %24 = struct_extract %11 : $Int, #Int._value    // user: %25
  %25 = builtin "and_Int64"(%24 : $Builtin.Int64, %23 : $Builtin.Int64) : $Builtin.Int64 // users: %55, %46, %36, %30, %26
  %26 = struct $Int (%25 : $Builtin.Int64)        // user: %27
  %27 = struct $_HashTable.Bucket (%26 : $Int)    // users: %54, %42
  %28 = integer_literal $Builtin.Int64, 64        // users: %66, %60, %246, %240, %330, %324, %288, %282, %216, %210, %36, %30
  %29 = integer_literal $Builtin.Int64, 0         // users: %308, %70, %250, %334, %292, %220, %40
  %30 = builtin "udiv_Int64"(%25 : $Builtin.Int64, %28 : $Builtin.Int64) : $Builtin.Int64 // user: %31
  %31 = builtin "truncOrBitCast_Int64_Word"(%30 : $Builtin.Int64) : $Builtin.Word // user: %32
  %32 = index_addr %12 : $*_UnsafeBitset.Word, %31 : $Builtin.Word // user: %33
  %33 = struct_element_addr %32 : $*_UnsafeBitset.Word, #_UnsafeBitset.Word.value // user: %34
  %34 = struct_element_addr %33 : $*UInt, #UInt._value // user: %35
  %35 = load %34 : $*Builtin.Int64                // user: %39
  %36 = builtin "urem_Int64"(%25 : $Builtin.Int64, %28 : $Builtin.Int64) : $Builtin.Int64 // user: %37
  %37 = builtin "and_Int64"(%36 : $Builtin.Int64, %18 : $Builtin.Int64) : $Builtin.Int64 // user: %38
  %38 = builtin "shl_Int64"(%13 : $Builtin.Int64, %37 : $Builtin.Int64) : $Builtin.Int64 // user: %39
  %39 = builtin "and_Int64"(%35 : $Builtin.Int64, %38 : $Builtin.Int64) : $Builtin.Int64 // user: %40
  %40 = builtin "cmp_eq_Int64"(%39 : $Builtin.Int64, %29 : $Builtin.Int64) : $Builtin.Int1 // user: %41
  cond_br %40, bb1, bb2                           // id: %41

@rjmccall
Copy link
Member

rjmccall commented May 2, 2019

We know there's work to do to make the coroutine lowering passes more aggressive about not heap-allocating the stack frame if e.g. there are multiple resume points. This SR will serve as a fine anchor for tracking that work.

@weissi
Copy link
Member Author

weissi commented Jul 2, 2019

@swift-ci create

@lorentey
Copy link
Member

lorentey commented Aug 8, 2019

I submitted a stdlib-side workaround:

#26549

This can be reverted when the underlying optimization issue is resolved.

@Lukasa
Copy link
Contributor

Lukasa commented Sep 18, 2020

Recent testing with Swift 5.3 seems to show allocations back in the dictionary modify accessor, in the exact same place. @lorentey did we regress this in any way?

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

5 participants