Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-2905] Linux libpwq: Dispatch queues + blocking tasks = poor performance #719

Open
swift-ci opened this issue Oct 10, 2016 · 7 comments

Comments

@swift-ci
Copy link

Previous ID SR-2905
Radar None
Original Reporter fumoboy007 (JIRA User)
Type Bug
Environment

Linux, Swift 3.0

Additional Detail from JIRA
Votes 1
Component/s libdispatch
Labels Bug
Assignee dgrove-oss (JIRA)
Priority Medium

md5: ef7acc516d0911e996748b91cd397e7a

Issue Description:

Consider this code:

for i in 1...100 {
        DispatchQueue.global().async {
                print(i)
                sleep(1000)
        }
}
dispatchMain()

Environment: Linux, 8 cores

Expected outcome: The first 8 work items start running. When the threads block on the sleep system call, the manager creates additional threads so that the number of active threads is always 8.

Actual outcome: The first 8 work items start running. Then every 1 second, one new work item runs. It takes 93 seconds for all the print statements to run.

There are actually two issues here.

The first issue is, on every run of the manager, the manager only creates 1 additional worker thread even though all of the existing threads are blocked. Instead, it should create min(num_blocked_threads, cpu_count) worker threads so that we always have cpu_count active threads.

The second issue is the manager only runs once a second. I assume this is to prevent the manager thread from taking away significant CPU time from other threads. Is it possible to achieve the same result by using thread priorities? In other words, can we set the manager thread priority to a special number such that the thread scheduler will only schedule it when the CPU is idle?

@swift-ci
Copy link
Author

Comment by Daniel A. Steffen (JIRA)

These all sound like limitations of the Linux libwq userspace emulation of the Darwin kernel workqueue that can likely only be fully solved with in-kernel detection of blocked threads.

In particular I don't think libwq manager prioritization would really help here (e.g. you can be starved on one of the higher priority global queues with the lower priority global queues making progress and preventing a lowest priority libwq manager from running)

We have discussed possible future Linux kernel approaches to this overall issue with dgrove-oss (JIRA User) and frankeh (JIRA User), I'll let them speak to that in detail.

@swift-ci
Copy link
Author

Comment by Darren Mo (JIRA)

If thread scheduling is per-process, then lowering the priority of the manager thread should work because there will be free CPU cores when worker threads are idle/blocked. (I assume we only want to run the manager when there are idle/blocked workers.)

However, Linux schedules threads system-wide, so I guess the lower-priority manager thread could be prevented from running due to higher-priority threads of other processes?

@swift-ci
Copy link
Author

Comment by David Grove (JIRA)

As das (JIRA User) said, these are symptoms of a lack of integration between libpwq and the underlying Linux scheduler. Because libpwq has incomplete information, it is very conservative in how it manages the pool of worker threads.

We have experimented with a kernel module that hooks the Linux scheduler's process change handler and provides rapid feedback to the libpwq manager thread when the number of runnable worker threads is below or above the desired range. The manager is then extended to process the stream of events from the kernel module to add/remove worker threads. This easily handles programs like the testcase above. However, since the approach relies on installation of a kernel module, it is not clear whether or not this is actually a useful solution to the problem. We speculate that many Linux users will be running Swift in managed/cloud environments where they will either not be able to or will not want to load such a kernel module.

For some (many? most?) workloads, it may be sufficient to modestly increase the frequency of the manager's execution (by changing the sleep interval in manager_main) and/or intentionally over-subscribing the system and letting the Linux scheduler manage the competing threads (for example attempting to maintain 2x or 4x the number of CPUs worker threads to account for threads in short/medium term blocking operations).

I'd be interested in details about the real workload you are trying to run to see how these various approaches might work.

@swift-ci
Copy link
Author

Comment by Darren Mo (JIRA)

​I work for a large company that runs thousands of server applications, some of which receive hundreds of thousands of requests per second. Requests are independent of each other. Generally, requests involve multiple disk and network I/O calls, some often in parallel. Let me know what other specifics of the workload you are interested in.

We recently moved to Go, which has an “optimal” threading model. If you are not familiar, Go has a concept called “goroutines”. Conceptually, goroutines are like GCD work items in that they are tasks that are scheduled onto a pool of kernel threads. The major difference is that system calls in Go first go through the Go runtime, thus the Go scheduler can create a new thread just before the current thread blocks.

In this regard, Go is great because developers can write synchronous (straightforward) code without any performance tradeoffs. This is a major reason that we chose Go as our primary language.

However, the other aspects of the Go language design have serious deficiencies. On the flip side, Swift excels at language design, but currently all multithreaded code must be asynchronously written in order to achieve maximum performance. If we could solve this problem, Swift would be the best of both worlds. This could have a huge impact!

We manage our own machines, so a kernel module is feasible for us. For others, some quick searches tell me that many cloud providers allow you to use your own kernel and/or install kernel modules. I think a kernel module would be the cleanest and most flexible approach.

@swift-ci
Copy link
Author

Comment by Darren Mo (JIRA)

By the way, the behaviour on macOS (Sierra) is interesting. It creates 64 threads right away and then stops. Is this by design?

@swift-ci
Copy link
Author

Comment by David Grove (JIRA)

Thanks for the description of the workload. It's good feedback to know that you would be able to deploy a kernel module. When we get something that looks solid, I'll circle back to this issue to see if you'd be able to give it a try on your workloads.

@swift-ci
Copy link
Author

Comment by Darren Mo (JIRA)

Oh I forgot to mention that we run our applications inside Docker containers. Not sure how that would affect the kernel module.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant