Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-9033] Dispatch spins in a tight loop when receiving EPOLLHUP #644

Closed
weissi opened this issue Oct 18, 2018 · 3 comments
Closed

[SR-9033] Dispatch spins in a tight loop when receiving EPOLLHUP #644

weissi opened this issue Oct 18, 2018 · 3 comments

Comments

@weissi
Copy link
Member

weissi commented Oct 18, 2018

Previous ID SR-9033
Radar rdar://problem/45369001
Original Reporter @weissi
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Additional Detail from JIRA
Votes 0
Component/s libdispatch
Labels Bug, Linux
Assignee None
Priority Medium

md5: 61275b614c6118d52abb224da518046b

is duplicated by:

  • SR-5773 DispatchIO.read on Linux inconsistent with macOS

Issue Description:

Dispatch internally assumes that epoll_wait will only ever send events that Dispatch subscribed to. That's not true however because EPOLLHUP is an unmaskable event that one is always subscribed to. That leads to epoll_wait returning and Dispatch immediately going back into epoll_wait which then returns again --> 100% spin.

man epoll_ctl says:

EPOLLHUP
Hang up happened on the associated file descriptor.
epoll_wait(2) will always wait for this event; it is not nec‐
essary to set it in events.

Note that when reading from a channel such as a pipe or a
stream socket, this event merely indicates that the peer
closed its end of the channel. Subsequent reads from the
channel will return 0 (end of file) only after all outstanding
data in the channel has been consumed.

EPOLLHUB seems to happen in the following events:

  • other end of a FIFO hang up

  • TCP resets

  • other end of a UNIX domain socket hung up

This simple demo program demonstrates that:

import Dispatch
#if os(macOS)
import Darwin
#else
import Glibc
#endif

func withPipe(_ body: (CInt, CInt) -> Void) -> Void {
    var fds: [Int32] = [-1, -1]
    fds.withUnsafeMutableBufferPointer { ptr in
        let err = pipe(ptr.baseAddress!)
        precondition(err == 0)
    }
    body(fds[0], fds[1])
}

withPipe { readFD, writeFD in
    print("readFD=\(readFD), writeFD=\(writeFD)")
    let q = DispatchQueue(label: "q")
    let io = DispatchIO(type: .stream, fileDescriptor: readFD, queue: q) { err in
        print("cleanup, err=\(err)")
        close(readFD)
        print("all done")
        exit(0)
    }
    io.setLimit(lowWater: 0)
    io.read(offset: 0, length: .max, queue: q) { done, data, err in
        print("read: \(done), \((data?.count).debugDescription), \(err)")
        if let data = data, data.count > 0 {
            // will only happen once
            print("closing writeFD")
            close(writeFD)
            q.asyncAfter(deadline: .now() + 1) {
                io.close()
            }
        }
    }
    io.resume()
    print("writing")
    write(writeFD, "x", 1)
    print("wrtten")
    dispatchMain()
}

On Darwin, running it looks like this:

readFD=3, writeFD=4
writing
wrtten
read: false, Optional(1), 0
closing writeFD
read: true, Optional(0), 0
cleanup, err=0
all done
Program ended with exit code: 0

on Linux however we get

readFD=3, writeFD=4
writing
wrtten
read: false, Optional(1), 0
closing writeFD
[hang with 100% CPU spin...]

stracing the program looks like this:

 strace -f -e trace=epoll_ctl,epoll_wait ./main
readFD=3, writeFD=4
Process 103 attached
writing
wrtten
Process 104 attached
Process 105 attached
[pid   105] epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN|0x4000, {u32=1, u64=1}}) = 0
Process 106 attached
[pid   106] epoll_wait(5, {{EPOLLIN, {u32=1, u64=1}}}, 16, 0) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLONESHOT, {u32=3, u64=3}}read: false, Optional(1), 0
) = 0
closing writeFD
[pid   106] epoll_wait(5, {}, 16, 0)    = 0
[pid   106] epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLIN, {u32=1, u64=1}}, {EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 2
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[pid   106] epoll_wait(5, {{EPOLLHUP, {u32=1006635200, u64=139961105909952}}}, 16, -1) = 1
[pid   106] epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1006635200, u64=139961105909952}}) = 0
[continued forever...]

FWIW, SwiftNIO used to have the same bug: apple/swift-nio@b109389?diff=unified

@weissi
Copy link
Member Author

weissi commented Oct 18, 2018

@swift-ci create

@weissi
Copy link
Member Author

weissi commented May 3, 2019

I provided a (failing on Linux) test case here: #476

@weissi
Copy link
Member Author

weissi commented May 7, 2019

fixed by #478 . Thanks @adierking!

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant