Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-8017] Non-recoverable error while stress-testing URLSession #3675

Open
swift-ci opened this issue Jun 16, 2018 · 12 comments
Open

[SR-8017] Non-recoverable error while stress-testing URLSession #3675

swift-ci opened this issue Jun 16, 2018 · 12 comments
Assignees

Comments

@swift-ci
Copy link
Contributor

Previous ID SR-8017
Radar None
Original Reporter lwalkin (JIRA User)
Type Bug
Status In Progress
Resolution
Environment

Linux ubuntu-xenial 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
swift -version
Swift version 4.1-dev (LLVM 1c8b50929b, Clang 420ae40df6, Swift b2d7fb958c)
Target: x86_64-unknown-linux-gnu

Additional Detail from JIRA
Votes 2
Component/s Foundation
Labels Bug
Assignee @ahti
Priority Medium

md5: 53f199ac413a677a7f4337be7215113a

Issue Description:

Summary:

Non-recoverable fatal error is produced when operating URL session under stress.

Steps To Reproduce:

The code is along the lines of:

func testURLGetStressTest() throws {
{{ let testData: [UInt8] = [1, 2, 3]}}
{{ let numObjects = 10}}
{{ let objects: [[UInt8]] = try (0 ..< numObjects).map{ _ in}}
{{ let size = random() % 64}}
{{ let data = (0 ..< size).map{ _ in UInt8(random() & 0xFF) }}}
{{ return try CODE_THAT_USES_URLSESSION}}
{{ }}}

{{ // Dispatch a bunch of random requests.}}
{{ DispatchQueue.concurrentPerform(iterations: numObjects * 10) { _ in}}
{{ let object = objects[random() % objects.count]}}
{{ let (statusCode, response) = try! getContents(url: "http://127.0.0.1:(port)/(object)")}}
{{ }}}
{{ }}}

Results:

Fatal error: Trying to access a behaviour for a task that in not in the registry.: file Foundation/URLSession/TaskRegistry.swift, line 105

Irrespectively of how this code can be triggered, and whether this thing is easily reproducible, it shouldn't generate a fatal error in the first place. Should be something the application code can recover from.

Notes:

The system is specifically placed under stress by this test, either from the point of view of file descriptor, or DispatchQueue threads, or something like that.

@ahti
Copy link
Contributor

ahti commented Jul 11, 2018

I'm experiencing the same issue.

My project repeatedly requests the same URL. The server delays the response for up to 50 seconds until new data is available. In the completion handler I re-run the request. By no means a heavy load.

To reproduce, try placing this snippet in a file, starting a web server (preferrably one that fails fast, as the response doesn't seem to be significant, i used darkhttpd) and running it with swift run:

import Foundation

let session = URLSession(configuration: .default)

var counter = 0

func makeRequest() {
    let req = URLRequest(url: URL(string: "http://localhost:9999/lol")!)
    session.dataTask(with: req) { d, r, e in
        print("rollin \(counter)")
        counter += 1
        makeRequest()
    }.resume()
}

makeRequest()

dispatchMain()

On my linux machine with a Swift version built from the current master, it will crash randomly, most times within the first few thousand requests.

This is the stack trace that I get:

Fatal error: Trying to access a behaviour for a task that in not in the registry.: file Foundation/URLSession/TaskRegistry.swift, line 105
#&#8203;0 0x0000559254fd28d4 PrintStackTraceSignalHandler(void*) (/usr/bin/swift+0x3f778d4)
#&#8203;1 0x0000559254fd06ec llvm::sys::RunSignalHandlers() (/usr/bin/swift+0x3f756ec)
#&#8203;2 0x0000559254fd2a92 SignalHandler(int) (/usr/bin/swift+0x3f77a92)
#&#8203;3 0x00007fe6b6156a80 __restore_rt (/usr/lib/libpthread.so.0+0x11a80)
#&#8203;4 0x00007fe6b18e8433 $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n (/usr/lib/swift/linux/libswiftCore.so+0x367433)
#&#8203;5 0x00007fe6b1703a69 (/usr/lib/swift/linux/libswiftCore.so+0x182a69)
#&#8203;6 0x00007fe6b09d24bb $S10Foundation10URLSessionC13_TaskRegistryC9behaviour3forAE10_BehaviourOAA0bC0C_tF (/usr/lib/swift/linux/libFoundation.so+0x6ee4bb)
#&#8203;7 0x00007fe6b09bc29a $S10Foundation10URLSessionC9behaviour3forAC14_TaskBehaviourOAA0bE0C_tF (/usr/lib/swift/linux/libFoundation.so+0x6d829a)
#&#8203;8 0x00007fe6b09df795 $S10Foundation15_NativeProtocolC27createTransferBodyDataDrain33_EACA7312FD098F51ABB0E9FB36E86156LLAC01_gH0OyF (/usr/lib/swift/linux/libFoundation.so+0x6fb795)
#&#8203;9 0x00007fe6b09df9a4 $S10Foundation15_NativeProtocolC19createTransferState3url9workQueueAC01_eF0VAA3URLV_8Dispatch0kI0CtF (/usr/lib/swift/linux/libFoundation.so+0x6fb9a4)
#&#8203;10 0x00007fe6b09e6186 $S10Foundation15_NativeProtocolC16startNewTransfer4withyAA10URLRequestV_tFTf4xn_n (/usr/lib/swift/linux/libFoundation.so+0x702186)
#&#8203;11 0x00007fe6b09dfcc9 $S10Foundation15_NativeProtocolC16startNewTransfer4withyAA10URLRequestV_tF (/usr/lib/swift/linux/libFoundation.so+0x6fbcc9)
#&#8203;12 0x00007fe6b09dfe76 $S10Foundation15_NativeProtocolC6resumeyyF (/usr/lib/swift/linux/libFoundation.so+0x6fbe76)
#&#8203;13 0x00007fe6b09d937a $S10Foundation15_NativeProtocolC12startLoadingyyF (/usr/lib/swift/linux/libFoundation.so+0x6f537a)
#&#8203;14 0x00007fe6b09d0019 $S10Foundation14URLSessionTaskC6resumeyyFyyXEfU_yycfU_TA (/usr/lib/swift/linux/libFoundation.so+0x6ec019)
#&#8203;15 0x00007fe6b087b1d0 $SIeg_IeyB_TR (/usr/lib/swift/linux/libFoundation.so+0x5971d0)
#&#8203;16 0x00007fe6aa0a2ff7 _dispatch_call_block_and_release (/usr/lib/swift/linux/libdispatch.so+0x4aff7)
#&#8203;17 0x00007fe6aa0b09bc _dispatch_queue_serial_drain (/usr/lib/swift/linux/libdispatch.so+0x589bc)
#&#8203;18 0x00007fe6aa0b122f _dispatch_queue_invoke (/usr/lib/swift/linux/libdispatch.so+0x5922f)
#&#8203;19 0x00007fe6aa0b0880 _dispatch_queue_serial_drain (/usr/lib/swift/linux/libdispatch.so+0x58880)
#&#8203;20 0x00007fe6aa0b122f _dispatch_queue_invoke (/usr/lib/swift/linux/libdispatch.so+0x5922f)
#&#8203;21 0x00007fe6aa0b3b1a _dispatch_worker_thread (/usr/lib/swift/linux/libdispatch.so+0x5bb1a)
#&#8203;22 0x00007fe6b614c075 start_thread (/usr/lib/libpthread.so.0+0x7075)
#&#8203;23 0x00007fe6b467453f __GI___clone (/usr/lib/libc.so.6+0xf853f)
fish: “swift test.swiftterminated by signal SIGILL (Illegal instruction)

Wrapping the recursive call to makeRequest() in DispatchQueue.main.async leads to different stacktraces, and not just one:

Fatal error: Trying to remove task, but it's not in the registry.: file Foundation/URLSession/TaskRegistry.swift, line 76
#&#8203;0 0x000056316bd1d8d4 PrintStackTraceSignalHandler(void*) (/usr/bin/swift+0x3f778d4)
#&#8203;1 0x000056316bd1b6ec llvm::sys::RunSignalHandlers() (/usr/bin/swift+0x3f756ec)
#&#8203;2 0x000056316bd1da92 SignalHandler(int) (/usr/bin/swift+0x3f77a92)
#&#8203;3 0x00007f502a06ca80 __restore_rt (/usr/lib/libpthread.so.0+0x11a80)
#&#8203;4 0x00007f50257fe433 $Ss17_assertionFailure__4file4line5flagss5NeverOs12StaticStringV_SSAHSus6UInt32VtFTf4nxnnn_n (/usr/lib/swift/linux/libswiftCore.so+0x367433)
#&#8203;5 0x00007f5025619a69 (/usr/lib/swift/linux/libswiftCore.so+0x182a69)
#&#8203;6 0x00007f50248e80bb $S10Foundation10URLSessionC13_TaskRegistryC6removeyyAA0bC0CF (/usr/lib/swift/linux/libFoundation.so+0x6ee0bb)
#&#8203;7 0x00007f50248e487d $S10Foundation15_ProtocolClientC03urlB16DidFinishLoadingyyAA11URLProtocolCFyycfU1_TA (/usr/lib/swift/linux/libFoundation.so+0x6ea87d)
#&#8203;8 0x00007f5024792b9f $S10Foundation14BlockOperationC4mainyyF (/usr/lib/swift/linux/libFoundation.so+0x598b9f)
#&#8203;9 0x00007f5024790f36 $S10Foundation9OperationC5startyyF (/usr/lib/swift/linux/libFoundation.so+0x596f36)
#&#8203;10 0x00007f50247945eb $S10Foundation14OperationQueueC04_runB0yyF (/usr/lib/swift/linux/libFoundation.so+0x59a5eb)
#&#8203;11 0x00007f502479c247 $S10Foundation14OperationQueueC13addOperations_17waitUntilFinishedySayAA0B0CG_SbtFyAGXEfU0_yycfU_TA (/usr/lib/swift/linux/libFoundation.so+0x5a2247)
#&#8203;12 0x00007f50247911d0 $SIeg_IeyB_TR (/usr/lib/swift/linux/libFoundation.so+0x5971d0)
#&#8203;13 0x00007f5024197d25 _dispatch_block_async_invoke2 (/usr/lib/swift/linux/libdispatch.so+0x5ad25)
#&#8203;14 0x00007f5024195a9e _dispatch_queue_serial_drain (/usr/lib/swift/linux/libdispatch.so+0x58a9e)
#&#8203;15 0x00007f502419622f _dispatch_queue_invoke (/usr/lib/swift/linux/libdispatch.so+0x5922f)
#&#8203;16 0x00007f5024198b1a _dispatch_worker_thread (/usr/lib/swift/linux/libdispatch.so+0x5bb1a)
#&#8203;17 0x00007f502a062075 start_thread (/usr/lib/libpthread.so.0+0x7075)
#&#8203;18 0x00007f502858a53f __GI___clone (/usr/lib/libc.so.6+0xf853f)
fish: “swift test.swiftterminated by signal SIGILL (Illegal instruction)

A very similar second one:

 #&#8203;0 0x0000560e0ad478d4 PrintStackTraceSignalHandler(void*) (/usr/bin/swift+0x3f778d4)
#&#8203;1 0x0000560e0ad456ec llvm::sys::RunSignalHandlers() (/usr/bin/swift+0x3f756ec)
#&#8203;2 0x0000560e0ad47a92 SignalHandler(int) (/usr/bin/swift+0x3f77a92)
#&#8203;3 0x00007fd131ce8a80 __restore_rt (/usr/lib/libpthread.so.0+0x11a80)
#&#8203;4 0x00007fd126135b43 $Ss23_NativeDictionaryBufferV12assertingGetyx3key_q_5valuets01_aB5IndexVyxq_GFSi_10Foundation10URLSessionC13_TaskRegistryC10_BehaviourOTg5 (/usr/lib/swift/linux/libFoundation.so+0x6f2b43)
#&#8203;5 0x00007fd12613499c $Ss24_VariantDictionaryBufferO12nativeRemove2atx3key_q_5valuets07_NativeB5IndexVyxq_G_tFSi_10Foundation10URLSessionC13_TaskRegistryC10_BehaviourOTg5 (/usr/lib/swift/linux/libFoundation.so+0x6f199c)
#&#8203;6 0x00007fd1261310c7 $S10Foundation10URLSessionC13_TaskRegistryC6removeyyAA0bC0CF (/usr/lib/swift/linux/libFoundation.so+0x6ee0c7)
#&#8203;7 0x00007fd12612d87d $S10Foundation15_ProtocolClientC03urlB16DidFinishLoadingyyAA11URLProtocolCFyycfU1_TA (/usr/lib/swift/linux/libFoundation.so+0x6ea87d)
#&#8203;8 0x00007fd125fdbb9f $S10Foundation14BlockOperationC4mainyyF (/usr/lib/swift/linux/libFoundation.so+0x598b9f)
#&#8203;9 0x00007fd125fd9f36 $S10Foundation9OperationC5startyyF (/usr/lib/swift/linux/libFoundation.so+0x596f36)
#&#8203;10 0x00007fd125fdd5eb $S10Foundation14OperationQueueC04_runB0yyF (/usr/lib/swift/linux/libFoundation.so+0x59a5eb)
#&#8203;11 0x00007fd125fe5247 $S10Foundation14OperationQueueC13addOperations_17waitUntilFinishedySayAA0B0CG_SbtFyAGXEfU0_yycfU_TA (/usr/lib/swift/linux/libFoundation.so+0x5a2247)
#&#8203;12 0x00007fd125fda1d0 $SIeg_IeyB_TR (/usr/lib/swift/linux/libFoundation.so+0x5971d0)
#&#8203;13 0x00007fd12c247d25 _dispatch_block_async_invoke2 (/usr/lib/swift/linux/libdispatch.so+0x5ad25)
#&#8203;14 0x00007fd12c245a9e _dispatch_queue_serial_drain (/usr/lib/swift/linux/libdispatch.so+0x58a9e)
#&#8203;15 0x00007fd12c24622f _dispatch_queue_invoke (/usr/lib/swift/linux/libdispatch.so+0x5922f)
#&#8203;16 0x00007fd12c248b1a _dispatch_worker_thread (/usr/lib/swift/linux/libdispatch.so+0x5bb1a)
#&#8203;17 0x00007fd131cde075 start_thread (/usr/lib/libpthread.so.0+0x7075)
#&#8203;18 0x00007fd13020653f __GI___clone (/usr/lib/libc.so.6+0xf853f)
fish: “swift test.swiftterminated by signal SIGILL (Illegal instruction)

And a third, more rare one:

Fatal error: file /home/lukas/code/swift/swift/stdlib/public/core/Dictionary.swift, line 2372
#&#8203;0 0x00005622e83ca8d4 PrintStackTraceSignalHandler(void*) (/usr/bin/swift+0x3f778d4)
#&#8203;1 0x00005622e83c86ec llvm::sys::RunSignalHandlers() (/usr/bin/swift+0x3f756ec)
#&#8203;2 0x00005622e83caa92 SignalHandler(int) (/usr/bin/swift+0x3f77a92)
#&#8203;3 0x00007f74b49c8a80 __restore_rt (/usr/lib/libpthread.so.0+0x11a80)
#&#8203;4 0x00007f74abbb1cbc $Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF (/usr/lib/swift/linux/libswiftCore.so+0x182cbc)
#&#8203;5 0x00007f74a92fe77c $Ss24_VariantDictionaryBufferO12nativeDelete_11idealBucket6offsetys07_NativebC0Vyxq_G_S2itFSi_10Foundation14URLSessionTaskCTg5 (/usr/lib/swift/linux/libFoundation.so+0x6f177c)
#&#8203;6 0x00007f74a92feaf3 $Ss24_VariantDictionaryBufferO12nativeRemove2atx3key_q_5valuets07_NativeB5IndexVyxq_G_tFSi_10Foundation14URLSessionTaskCTg5 (/usr/lib/swift/linux/libFoundation.so+0x6f1af3)
#&#8203;7 0x00007f74a92faf77 $S10Foundation10URLSessionC13_TaskRegistryC6removeyyAA0bC0CF (/usr/lib/swift/linux/libFoundation.so+0x6edf77)
#&#8203;8 0x00007f74a92f787d $S10Foundation15_ProtocolClientC03urlB16DidFinishLoadingyyAA11URLProtocolCFyycfU1_TA (/usr/lib/swift/linux/libFoundation.so+0x6ea87d)
#&#8203;9 0x00007f74a91a5b9f $S10Foundation14BlockOperationC4mainyyF (/usr/lib/swift/linux/libFoundation.so+0x598b9f)
#&#8203;10 0x00007f74a91a3f36 $S10Foundation9OperationC5startyyF (/usr/lib/swift/linux/libFoundation.so+0x596f36)
#&#8203;11 0x00007f74a91a75eb $S10Foundation14OperationQueueC04_runB0yyF (/usr/lib/swift/linux/libFoundation.so+0x59a5eb)
#&#8203;12 0x00007f74a91af247 $S10Foundation14OperationQueueC13addOperations_17waitUntilFinishedySayAA0B0CG_SbtFyAGXEfU0_yycfU_TA (/usr/lib/swift/linux/libFoundation.so+0x5a2247)
#&#8203;13 0x00007f74a91a41d0 $SIeg_IeyB_TR (/usr/lib/swift/linux/libFoundation.so+0x5971d0)
#&#8203;14 0x00007f74b00d6d25 _dispatch_block_async_invoke2 (/usr/lib/swift/linux/libdispatch.so+0x5ad25)
#&#8203;15 0x00007f74b00d4a9e _dispatch_queue_serial_drain (/usr/lib/swift/linux/libdispatch.so+0x58a9e)
#&#8203;16 0x00007f74b00d522f _dispatch_queue_invoke (/usr/lib/swift/linux/libdispatch.so+0x5922f)
#&#8203;17 0x00007f74b00d7b1a _dispatch_worker_thread (/usr/lib/swift/linux/libdispatch.so+0x5bb1a)
#&#8203;18 0x00007f74b49be075 start_thread (/usr/lib/libpthread.so.0+0x7075)
#&#8203;19 0x00007f74b2ee653f __GI___clone (/usr/lib/libc.so.6+0xf853f)
fish: “swift test.swiftterminated by signal SIGILL (Illegal instruction)

Is there maybe some concurrent access to the task registries underlying dictionary causing this?

@ahti
Copy link
Contributor

ahti commented Jul 11, 2018

I created a PR which resolves the crashes for me: #1625

@ahti
Copy link
Contributor

ahti commented Jul 17, 2018

My PR has been merged. lwalkin (JIRA User), once the new (>= 2018-02-17) nightly is out on https://swift.org/download/#releases could you try and see if it's fixed for you as well?

@swift-ci
Copy link
Contributor Author

Comment by Lev Walkin (JIRA)

@ahti, I will try, thank you very much!

@ahti
Copy link
Contributor

ahti commented Jul 18, 2018

Oh, looks like it didn't make this nightly, so I guess it'll be in tomorrows.

@weissi
Copy link
Member

weissi commented Mar 31, 2019

this seems to be the bug that @mxcl hit: https://twitter.com/mxcl/status/1111073495017029635

The fix mentioned above was merged July 2018 so it should deffo be in Swift 5, any ideas @pushkarnk/@millenomi/@spevans?

@mxcl
Copy link

mxcl commented Mar 31, 2019

To confirm, I am absolutely certain I'm running 5.0.0 GM, and I’ve only seen this crash once since I rebuilt the code on the server, it’s been running constantly in that time (almost a week).

@pushkarnk
Copy link
Collaborator

We seem to be seeing this in the KituraKit too: Kitura/KituraKit#45

This might not be a race condition to remove a task (this is something the PR raised for this particular issue solves), we might be actually trying to remove the task twice.

@pushkarnk
Copy link
Collaborator

saiHema (JIRA User) Found this issue in URLSession which has an identical symptom.

https://bugs.swift.org/browse/SR-10281

@ahti
Copy link
Contributor

ahti commented Apr 16, 2020

Since the linked bug has been resolved and there haven't been more reports of the same symptoms, I think this one can be closed as well. Any objections? 🙂

@swift-ci
Copy link
Contributor Author

Comment by Lev Walkin (JIRA)

OK by me.

@spevans
Copy link
Collaborator

spevans commented Apr 19, 2020

URLSession does still crash under stress testing, the last time I checked it was due to not closing its sockets correctly and then running out of file descriptors so its worth keeping this open until that issue is fixed so its not forgotten about.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants