New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-11557] URLSession closing unrelated file descriptors #3286
Comments
CC @millenomi/@spevans do we know what changed in URLSession from 5.0.x to 5.1? |
@swift-ci create |
also CC tannernelson (JIRA User)/ianpartridge (JIRA User)/@pushkarnk/@Lukasa/ldewailly (JIRA User): We all must stop using For SwiftNIO users, in debug builds, SwiftNIO will always crash if it detects
or similar. |
Probably caused by 02656bf libcurl ignores the socket close callback in certain uses due to the fall-through in the "secondary" case in https://github.com/curl/curl/blob/5050edb12484551f4c666212409c117cf38f9784/lib/connect.c#L1414-L1446, but I'm not clear on whether that would matter for CFURLSession's usage. I tend to doubt it, really. If anything, it looks like the sockets are being closed too quickly, which is ironic considering the commit was intended to avoid exactly that issue. The commit should probably be amended to ensure the curl handle has been fully torn down first if that is not already happening (I didn't look closely enough to be sure, so apologies if that's a red herring). |
@swift-ci create |
1 similar comment
@swift-ci create |
When I read this issue in the forums I was afraid it was going to be those changes in Foundation/cURL. We still don't understand why the double closing is happening, but Gwynne clues might be true. I will try to explain the situation before the patch was applied.
What the changes tries to avoid was that inversion of the order, and doing the things in the right order: tell cURL that we will close the sockets, and tear down the dispatch sources when the socket was not longer used by cURL. However, as Gwynne points out, that special branch in cURL about secondary sockets would have never been triggered in the previous patch, and it is being triggered now. Or maybe calling I would like to say, however, that the patch was trying to solve a problem that might have left a lot of epoll resources blocked forever in the case our analysis was right. |
Revert for now. |
@weissi: sorry, I forgot to ask before. I don't know that much about Swift-NIO, but why the changes in Foundation will affect the usage of writev in Swift-NIO? Or you mean that indirectly, a process that uses both Swift-NIO and Foundation independently will see those failures? |
@drodriguez SwiftNIO has a number of blacklisted errnos that must never happen and if they do, SwiftNIO will (in debug mode) crash the program. EBADF is one of our blacklisted assertions. This bug leads to double closes which might close random file descriptors owned by SwiftNIO. So if you use this broken version of URLSession and SwiftNIO in the same process, it’s highly likely that sooner or later your program will crash with the above error. I added this to the description here in case someone googles for it, we’ve had multiple reports of this crash already, all due to the bug described here. In fact, that’s how we first found it 🙂 |
Thanks for the explanation @weissi! I have been looking into this a little bit more. I split the repro into a Server and a Client to discard possible interaction, and it seems to be completely driven by FoundationNetworking. From what I can find out in the cases I have captured, it seems that the socket closing works correctly. Example of strace with my comments on the side. We are following socket 13. // I think this is cURL checking if there's something to read in the socket (https://github.com/curl/curl/blob/curl-7_58_0/lib/select.c#L217)
[pid 318] poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0 <unfinished ...>
...
// This is cURL reading from the socket (probably https://github.com/curl/curl/blob/curl-7_58_0/lib/sendf.c#L668, but not sure)
[pid 318] recvfrom(13, <unfinished ...>
...
[pid 318] <... recvfrom resumed> "HTTP/1.1 200 OK\r\nContent-Type: 8"..., 16384, 0, NULL, NULL) = 9052
...
// I think this libdispatch, removing the fd from the epoll used in Linux (there are several calls to epoll_ctl with EPOLL_CTL_DEL in https://github.com/apple/swift-corelibs-libdispatch/blob/master/src/event/event_epoll.c)
[pid 320] epoll_ctl(31, EPOLL_CTL_DEL, 13, NULL <unfinished ...>
...
[pid 320] <... epoll_ctl resumed> ) = 0
...
// Since above it seemed to be removed from dispatch, I think this is the cancel callback from the event source in FoundationNetworking (https://github.com/apple/swift-corelibs-foundation/blob/feebc938e69c8f2ce36ef297ea6c95e03c4d08c4/Foundation/URLSession/libcurl/MultiHandle.swift#L455)
[pid 318] close(13 <unfinished ...>
...
// One of the dup(1) is started from the main thread
[pid 315] dup(1 <unfinished ...>
...
[pid 318] <... close resumed> ) = 0
// And dup receives the recently closed 13
[pid 315] <... dup resumed> ) = 13
...
// And then cURL seems to be polling the socket again
[pid 322] poll([{fd=13, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0 <unfinished ...>
...
[pid 322] <... poll resumed> ) = 0 (Timeout)
// And then the close from the main thread happens (it can also be seen that 39 is closed before, and 40 is closed after, which was the same order that the dups above were returning descriptors)
[pid 315] close(13 <unfinished ...>
...
// Finally cURL tries to create a request in that socket that was closed
[pid 322] sendto(13, "GET / HTTP/1.1\r\nHost: localhost:"..., 199, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
...
// which obviosuly fails
[pid 322] <... sendto resumed> ) = -1 EBADF (Bad file descriptor) My problem is now finding why that second I want to see if I have some time to check without the patch, and see if the problem reproduces or not. I really don't see why letting cURL close the socket is that different than delaying closing the socket after the cancellation of the dispatch sources (which is the correct thing to do according to the docs, and seems the correct thing to do if you don't want crashes in some OS). |
Attachment: Download
Additional Detail from JIRA
md5: 2a28abe2dfef5c54655ad2a7c6ac999c
Issue Description:
This is a very serious issue where in Swift 5.1, corelibs-foundation's URLSession is closing unrelated file descriptors, leading to arbitrary undefined behaviour. Swift 5.0.x do not seem to have this issue.
The attached (single file but depending on SwiftNIO to have a web server) SwiftPM project uses URLSession to continuously make requests to the web server that the projects itself spawns too.
A successful run should look like
which it does on macOS (as well as on 5.0.3 on Linux).
On Linux however, the program usually gets stuck but arbitrary bad things can happen. The problem is best seen when started as
which greps for syscalls returning
EBADF
(which should never happen).Running it however leads to:
which shows severe file descriptor mismanagement in many cases.
One line repro from macOS:
Many thanks to jordane (JIRA User) who first reported the issue on the Vapor Discord.
The text was updated successfully, but these errors were encountered: