[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

swift-ci · 2016-10-10T07:37:57Z


Previous ID	SR-2905
Radar	None
Original Reporter	fumoboy007 (JIRA User)
Type	Bug

Environment

Linux, Swift 3.0

Additional Detail from JIRA


Votes	1
Component/s	libdispatch
Labels	Bug
Assignee	dgrove-oss (JIRA)
Priority	Medium

md5: ef7acc516d0911e996748b91cd397e7a

Issue Description:

Consider this code:

for i in 1...100 {
        DispatchQueue.global().async {
                print(i)
                sleep(1000)
        }
}
dispatchMain()

Environment: Linux, 8 cores

Expected outcome: The first 8 work items start running. When the threads block on the sleep system call, the manager creates additional threads so that the number of active threads is always 8.

Actual outcome: The first 8 work items start running. Then every 1 second, one new work item runs. It takes 93 seconds for all the print statements to run.

There are actually two issues here.

The first issue is, on every run of the manager, the manager only creates 1 additional worker thread even though all of the existing threads are blocked. Instead, it should create min(num_blocked_threads, cpu_count) worker threads so that we always have cpu_count active threads.

The second issue is the manager only runs once a second. I assume this is to prevent the manager thread from taking away significant CPU time from other threads. Is it possible to achieve the same result by using thread priorities? In other words, can we set the manager thread priority to a special number such that the thread scheduler will only schedule it when the CPU is idle?

The text was updated successfully, but these errors were encountered:

swift-ci · 2016-10-11T06:45:59Z

Comment by Daniel A. Steffen (JIRA)

These all sound like limitations of the Linux libwq userspace emulation of the Darwin kernel workqueue that can likely only be fully solved with in-kernel detection of blocked threads.

In particular I don't think libwq manager prioritization would really help here (e.g. you can be starved on one of the higher priority global queues with the lower priority global queues making progress and preventing a lowest priority libwq manager from running)

We have discussed possible future Linux kernel approaches to this overall issue with dgrove-oss (JIRA User) and frankeh (JIRA User), I'll let them speak to that in detail.

swift-ci · 2016-10-11T11:09:31Z

Comment by Darren Mo (JIRA)

If thread scheduling is per-process, then lowering the priority of the manager thread should work because there will be free CPU cores when worker threads are idle/blocked. (I assume we only want to run the manager when there are idle/blocked workers.)

However, Linux schedules threads system-wide, so I guess the lower-priority manager thread could be prevented from running due to higher-priority threads of other processes?

swift-ci · 2016-10-11T15:49:04Z

Comment by David Grove (JIRA)

As das (JIRA User) said, these are symptoms of a lack of integration between libpwq and the underlying Linux scheduler. Because libpwq has incomplete information, it is very conservative in how it manages the pool of worker threads.

We have experimented with a kernel module that hooks the Linux scheduler's process change handler and provides rapid feedback to the libpwq manager thread when the number of runnable worker threads is below or above the desired range. The manager is then extended to process the stream of events from the kernel module to add/remove worker threads. This easily handles programs like the testcase above. However, since the approach relies on installation of a kernel module, it is not clear whether or not this is actually a useful solution to the problem. We speculate that many Linux users will be running Swift in managed/cloud environments where they will either not be able to or will not want to load such a kernel module.

For some (many? most?) workloads, it may be sufficient to modestly increase the frequency of the manager's execution (by changing the sleep interval in manager_main) and/or intentionally over-subscribing the system and letting the Linux scheduler manage the competing threads (for example attempting to maintain 2x or 4x the number of CPUs worker threads to account for threads in short/medium term blocking operations).

I'd be interested in details about the real workload you are trying to run to see how these various approaches might work.

swift-ci · 2016-10-11T21:31:55Z

Comment by Darren Mo (JIRA)

I work for a large company that runs thousands of server applications, some of which receive hundreds of thousands of requests per second. Requests are independent of each other. Generally, requests involve multiple disk and network I/O calls, some often in parallel. Let me know what other specifics of the workload you are interested in.

We recently moved to Go, which has an “optimal” threading model. If you are not familiar, Go has a concept called “goroutines”. Conceptually, goroutines are like GCD work items in that they are tasks that are scheduled onto a pool of kernel threads. The major difference is that system calls in Go first go through the Go runtime, thus the Go scheduler can create a new thread just before the current thread blocks.

In this regard, Go is great because developers can write synchronous (straightforward) code without any performance tradeoffs. This is a major reason that we chose Go as our primary language.

However, the other aspects of the Go language design have serious deficiencies. On the flip side, Swift excels at language design, but currently all multithreaded code must be asynchronously written in order to achieve maximum performance. If we could solve this problem, Swift would be the best of both worlds. This could have a huge impact!

We manage our own machines, so a kernel module is feasible for us. For others, some quick searches tell me that many cloud providers allow you to use your own kernel and/or install kernel modules. I think a kernel module would be the cleanest and most flexible approach.

swift-ci · 2016-10-12T09:34:07Z

Comment by Darren Mo (JIRA)

By the way, the behaviour on macOS (Sierra) is interesting. It creates 64 threads right away and then stops. Is this by design?

swift-ci · 2016-10-12T15:25:11Z

Comment by David Grove (JIRA)

Thanks for the description of the workload. It's good feedback to know that you would be able to deploy a kernel module. When we get something that looks solid, I'll circle back to this issue to see if you'd be able to give it a try on your workloads.

swift-ci · 2016-10-12T18:42:26Z

Comment by Darren Mo (JIRA)

Oh I forgot to mention that we run our applications inside Docker containers. Not sure how that would affect the kernel module.

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

shahmishal transferred this issue from swiftlang/swift May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

swift-ci commented Oct 10, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 12, 2016

swift-ci commented Oct 12, 2016

swift-ci commented Oct 12, 2016

[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

Comments

swift-ci commented Oct 10, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 11, 2016

swift-ci commented Oct 12, 2016

swift-ci commented Oct 12, 2016

swift-ci commented Oct 12, 2016