Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-5773] DispatchIO.read on Linux inconsistent with macOS #681

Closed
swift-ci opened this issue Aug 26, 2017 · 12 comments
Closed

[SR-5773] DispatchIO.read on Linux inconsistent with macOS #681

swift-ci opened this issue Aug 26, 2017 · 12 comments

Comments

@swift-ci
Copy link

Previous ID SR-5773
Radar None
Original Reporter nuclearace (JIRA User)
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Environment

Swift version 3.1.1 (swift-3.1.1-RELEASE)
Target: x86_64-unknown-linux-gnu

Ubuntu 16.04

and

Apple Swift version 3.1 (swift-3.1-RELEASE)

Target: x86_64-apple-macosx10.9

macOS 10.12.5

Additional Detail from JIRA
Votes 1
Component/s libdispatch
Labels Bug
Assignee rlovelett (JIRA)
Priority Medium

md5: 89a90c899eae85ceb8bc7c56f5960b81

duplicates:

  • SR-9033 Dispatch spins in a tight loop when receiving EPOLLHUP

Issue Description:

It seems that when using a pipe and a DispatchIO channel linked to that pipe's file handle for reading, DispatchIO on Linux displays behavior inconsistent with macOS.

See https://gist.github.com/nuclearace/d67fba7c8f94c36d91e31e82863ec4ae for sample code that displays the issue. On macOS this consistently reads until EOF, while on Linux it either hangs while reading after the writer is done, or fails to fall into the EOF branch.

@belkadan
Copy link

ktopley-apple (JIRA User), dgrove-oss (JIRA User)

@swift-ci
Copy link
Author

Comment by Erik Little (JIRA)

It should be noted that someone running on Ubuntu 16.10 was unable to reproduce this inconsistency. I haven't tested on anything other than 16.04, so I can't verify that it might be specific to that Ubuntu version.

@swift-ci
Copy link
Author

Comment by Ryan Lovelett (JIRA)

I am also experiencing what I believe is this exact issue in Swift 4.2

The code I'm using to reproduce can be found in this forum thread.

@swift-ci
Copy link
Author

Comment by Ryan Lovelett (JIRA)

Okay I've made a small test case that exemplifies the bug I am seeing.

import Dispatch
import Foundation

let pipe1 = Pipe()
let pipe2 = Pipe()

let task1 = Process()
task1.launchPath = "/bin/cat"
task1.arguments = ["/dev/zero"]
task1.standardOutput = pipe1

let task2 = Process()
task2.launchPath = "/usr/bin/head"
task2.arguments = ["-c", "10000"]
task2.standardInput = pipe1
task2.standardOutput = pipe2

let global = DispatchQueue.global(qos: .background)
let channel = DispatchIO(type: .stream, fileDescriptor: pipe2.fileHandleForReading.fileDescriptor, queue: global) { (int) in
    DispatchQueue.main.async {
        print("Clean-up Handler: \(int)")
        exit(EXIT_SUCCESS)
    }
}

channel.setLimit(highWater: Int(PAGE_SIZE))

var bytes = 0
channel.read(offset: 0, length: Int.max, queue: global) { (closed, dispatchData, error) in
    if let data = dispatchData, !data.isEmpty {
        DispatchQueue.main.async {
            bytes += data.count
            print("Got data! \(data.count) of \(bytes)")
        }
    }

    if closed {
        channel.close()
    }
}

task1.launch()
task2.launch()

RunLoop.main.run()

It definitely is indeterminate when it will occur. I've taken to running it in a loop and eventually the loop will hang.
To debug it I have compiled libdispatch in debug mode --debug-libdispatch and --libdispatch-cmake-options=-DDISPATCH_ENABLE_ASSERTS=1 -DDISPATCH_DEBUG=1 -DDISPATCH_IO_DEBUG=1.

#!/usr/bin/env bash
rm test
rm run-*.log
~/Source/swift-source/rlovelett-installation/usr/bin/swiftc -g test.swift

for run in {001..100}
do
echo "Run ${run}"
LIBDISPATCH_LOG=stderr ./test 2> run-${run}.log
done

I'm still trying to make sense of the libdispatch logs to see if they shed any additional information.

@swift-ci
Copy link
Author

Comment by Ryan Lovelett (JIRA)

I think I have figured out what is causing this bug. Though I'm still not 100% what is the best way to fix the problem.

Basically, when Dispatch goes to process the file descriptor it gets a EAGAIN signal from read instead of the expected EOF. The EAGAIN is ostensibly something the code is designed to handle (as it does handle this when the file descriptor is first opened). The bug is that even though the read is queued for another attempt at processing it never actually gets triggered. This next part is the part I'm most murky about. I think the read is never actually triggered again because of this if condition.

Right now my very bad work-around is basically to call the read in a tight while loop with a sleep so long as it gets a EAGAIN. Before this patch the code would hang on roughly 3 in 100 runs. Now it hangs on roughly 0 in 10,000. Unfortunately, it makes the code unbearably slow. So while it "works" it is not a real solution.

@swift-ci
Copy link
Author

Comment by Ryan Lovelett (JIRA)

I've attached a proposed fix 0001-Requeue-the-stream-handler-if-the-read-would-block.patch to this issue. I'm still going to keep testing but I think this resolves the bug. This is not the same while loop patch mentioned above.

The real question is does this need a test? I guess I'll make a PR and ask that of the maintainers.

@swift-ci
Copy link
Author

Comment by Ryan Lovelett (JIRA)

Added link to pull request.

@swift-ci
Copy link
Author

Comment by Erik Little (JIRA)

rlovelett (JIRA User) thanks for looking at this!

@weissi
Copy link
Member

weissi commented May 3, 2019

this bug is actually the same as https://bugs.swift.org/browse/SR-9033 . The underlying issue here is that Dispatch doesn't handle EPOLLHUP but it has to. Proof: Running the attached repro with Swift 4.2.1 and then strace'ing a hanging instance gives you:

$ sudo strace -ff -p 300 
strace: Process 300 attached with 4 threads
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}} <unfinished ...>
[pid   307] select(64, [19], [], NULL, NULL <unfinished ...>
[pid   306] <... epoll_ctl resumed> )   = 0
[pid   302] ppoll([{fd=12, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   300] ppoll([{fd=9, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0
[pid   306] epoll_wait(14, [{EPOLLHUP, {u32=1073744064, u64=139682000144576}}], 16, -1) = 1
[pid   306] epoll_ctl(14, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLONESHOT|0x4000, {u32=1073744064, u64=139682000144576}}) = 0

@weissi
Copy link
Member

weissi commented May 3, 2019

the underlying issue is that Dispatch on Linux (and other systems that use epoll and not kqueue) doesn't handle EPOLLHUP.

@weissi
Copy link
Member

weissi commented May 7, 2019

this should be fixed on master by #478

@swift-ci
Copy link
Author

swift-ci commented Jun 5, 2019

Comment by Ryan Lovelett (JIRA)

Resolved by @weissi in #478.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants