New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SR-4142] Compilation gets slower when allowed more concurrent jobs #46725
Comments
Comment by Graydon Hoare (JIRA) Three things would be useful for diagnosing this:
Feel free to email such information privately; the last point in particular will reveal filenames and produce quite a lot of output on a large project. |
Comment by Jacek (JIRA) graydon (JIRA User) Thanks for the response. I did include detailed model information in the Environment section of the ticket. Please check it out. I will provide more details for the phase/file compilation offline. |
Comment by Graydon Hoare (JIRA) Also, given that you're testing on It may also be worth checking to confirm that you're seeing the same overall number of jobs running as you scale the concurrent-job count. This would rule out the possibility of the driver running the same job more than once. To check that, it'd be sufficient to run your workload twice with the instrumentation flags above, once at (say) |
Comment by Graydon Hoare (JIRA) Ah, apologies I didn't expand the environment part to see you'd described the model number variation. I think (hope you concur) that the straight-line perf variation is explained by the model + clock variation in the units, and there's only a question of concurrency to address here. A further question worth probing: given the large quantity of ObjC in these projects, I wonder if you've tried the new |
Comment by Jacek (JIRA) Our code base is not ready for 8.3 beta, and apparently the migration is not trivial to do in a short time. Once we've done it we'll see if there is any improvement. Will run tests as you've suggested and share the results. |
Comment by Jacek (JIRA) To confirm yes, we understand the concurrency issue is what we need to have addressed. |
Comment by Graydon Hoare (JIRA) I've reproduced this to some extent; running a concurrent swift build with -j 12 or -j 24 does appear to slow down noticeably (say a 20% increase in cumulative user-mode time) relative to -j 4 when run on a 12-way xeon mac pro. Wall time is still faster here, and system time is only going up to 100% of user time rather than 500% and slower wall time as you're seeing, but the phenomenon is occurring and is worrying even if less dramatic than you're seeing. Also confirm the odd observation that the slowdown does not occur on an i7 part like a macbook. Unclear on cause at present. Will follow up when we know more. |
@swift-ci create |
> Our code base is not ready for 8.3 beta, and apparently the migration is not trivial to do in a short time. Suliga (JIRA User) Can you file separate bugs about this? 8.3 is not supposed to have any intentional source breaking changes except for a few corner cases where the old compiler allowed unsound code resulting in undefined behavior. |
Comment by Jacek (JIRA) @slavapestov We did log bugs for issues we found so far:
|
Suliga (JIRA User) Thanks! Looks like the 'initialize' removal was intentional, and all but one of the rest have been fixe in swift-3.1-branch. Can you try a recent development snapshot? |
Comment by Jacek (JIRA) @slavapestov One more we had logged rdar://30871790 still reproduces with 3/19 snapshot. |
Comment by Graydon Hoare (JIRA) Just to update here: we've discussed in some detail with Jacek and looked at 3 main categories of mitigation.
I believe at this point the "slower with more jobs" angle is .. broadly under control, and the 12-core machine is at least pulling its weight (though is still clock-for-clock slower than a newer 4Ghz iMac). Jacek, can you confirm this impression on your end? Squeezing more parallelism out of the compiler may entail a bit more-involved changes, so I'm also curious to know what the core-utilization figures on your machines are at present (in whatever configuration is fastest). Are you still seeing much idle time? |
Comment by Jacek (JIRA) Indeed, between disabling a kernel extension we had on, and the WMO concurrency fixes made in 8.3.2, we no longer see the issue where using more cores hurts builds performance. It is still a problem with Xcode 8.2.1, so it's a combination of factors, not just one. As you noted, a quad core 4GHz iMac is still faster than the 12 core 2.7GHz MacPro. I'll let ob (JIRA User) add idle data from our side. Appreciate profiling on-site, and getting the fixes in so fast! Also, let me know if you'd like us to capture and share profiling data with the improved setup to see what else might be impacting the builds perf. |
Comment by Oscar Bonilla (JIRA) graydon (JIRA User) What I've been doing is watching the core utilization using What I'm doing is isolating those commands to profile them individually. What kind of information would be useful? Is there some debugging option to swift that can tell you where it's spending its time? |
Comment by Oscar Bonilla (JIRA) FWIW, the swift command I'm looking at is one of those where turning on WMO makes it compile faster and with more CPU utilization than with it off. This is without WMO, I see very little CPU utilization, occasionally one core at 100%:
This is with WMO, I see one core at 100% utilization most of the time:
Xcode version is 8.3.2 from Apple.
|
Comment by Oscar Bonilla (JIRA) This is after adding
|
Comment by Anton Iakimov (JIRA) It was solved for me in Xcode 9. Does anyone still have this issue? |
According to graydon (JIRA User) this was fixed. |
Comment by John Grange (JIRA) I am still seeing some very bad performance on a Mac Pro w/ 12 core on Xcode 9.0.1. What version of Xcode 9 has this been fixed in? |
grangej (JIRA User) What OS are you running? The problem was a combination of kernel issues exacerbated by syscall patterns executed from Clang's VFS layer and poorly-behaved antivirus stuff. The bug here was specifically that more parallelism was causing less throughput due to the aforementioned issues. Do you observe something similar, that you get better build times if you reduce the number of jobs? |
Comment by John Grange (JIRA) @jckarter I am on OS X 12.6 , Mac Pro 12 Core, Xcode 9.01. Running the following command seems to help, but my MacBook Pro still appears to be faster. defaults write com.apple.dt.Xcode IDEBuildOperationMaxNumberOfConcurrentCompileTasks 5 |
Comment by John Grange (JIRA) @jckarter I also tried 9.2 and am experiencing similar issues. Should I try upgrading to OS X 13? |
Attachment: Download
Environment
Xcode/Swift versions
8.2.1 (8C1002)
Apple Swift version 3.0.2 (swiftlang-800.0.63 clang-800.0.42.1)
Code base: Swift 3.0
OS versions tried:
El Capitan 10.11.6
Sierra 10.12.3
Hardware used:
4-cores 2.6 GHz Intel Core i7
16 GB 1600 MHz DDR3
2.7 GHz 12-Core Intel Xeon E5
32 GB 1866 MHz DDR3 ECC
2-cores 3 GHz Intel Core i7
16 GB 1600 MHz DDR3
4-cores 4 GHz Intel Core i7
32 GB 1867 MHz DDR3
Additional Detail from JIRA
md5: 705a03b7c838353453626f5302785d52
Issue Description:
We noticed a concerning issue where building our primarily Swift projects would take significantly longer on hardware with more cores available. Building a project on a 4-core MacBookPro would be about 33% faster than building the same project on a powerful 12-core MacPro tower.
Further investigation showed that, counterintuitively, reducing the number of concurrent jobs for xcodebuild on the MacPro machine improved the build times. For the 12-core machine, going from the default 24 jobs setting down to only 5 threads would improve compilation time by 23%.
For most of the hardware we tested with, 4-6 seems to be the optimal number of concurrent threads for the build speed, and allowing more to be used actually hurts the build times. The number of jobs was controlled using both
IDEBuildOperationMaxNumberOfConcurrentCompileTasks
setting, and-jobs
command line option forxcodebuild
.We've observed this pattern both for a sizable project (5K Swift files, containing 540K lines of code plus 2.5K ObjC files, containing 320K lines of code), as well on a smaller one (500 Swift files, containing 36K lines of code and 1,5K ObjC files, containing 150K lines of code).
Initially we thought this could be related to some SSD/filesystem inefficiencies, so we tried building with a ram disk (no difference), and also tried APFS on macSierra with the same results. The issue has been observed both with Whole Module Optimization on and off (for both debug and release builds).
See the chart attached which compares build times for 4 different types of hardware used vs. number of concurrent jobs requested. Notice how for all machines the fastest builds are around 4-6 jobs, after which point build times regress - most prominently for the MacPro. Even a two-core MacMini builds faster than the 12-core MacPro!
Another thing we've observed was that during the builds, the CPUs were heavily under-utilized. On average we've seen the cores being roughly 40% idle, even when building the 0.5M lines of code project.
See the text file attached with build times elapsed showing time consumed by system vs. user for MacPro. Numbers going from 3 to 24 indicate the number of concurrent jobs requested. Notice the times going from 32 minutes (for 3 jobs) up to almost an hour (24 jobs).
Please advise as to why we're seeing such a concerning behavior. Last year (around Swift 1.2-2.0) we performed a similar hardware analysis, and at that time for our projects 12-core MacPro machines would build 2-3x times faster than MacBookPros, and 4-5x faster than the MacMinis - which was expected based on the number of cores used for the builds.
The text was updated successfully, but these errors were encountered: