Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-292] Debug compiler: SIL Optimization takes 65% of compilation time for StdlibUnittest #42914

Open
belkadan opened this issue Dec 17, 2015 · 16 comments
Assignees
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance

Comments

@belkadan
Copy link
Contributor

Previous ID SR-292
Radar None
Original Reporter @belkadan
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Performance
Assignee @bob-wilson
Priority Medium

md5: 7fe4167166d7837ebe7280951902658f

Issue Description:

Within the last two months the compile time for StdlibUnittest under a debug compiler has jumped quite a bit. With the new frontend flag -debug-time-compilation, I found that SIL optimization seems to be the culprit.

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 849.5641 seconds (851.0508 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  556.7332 ( 65.8%)   1.7051 ( 52.8%)  558.4382 ( 65.7%)  558.7890 ( 65.7%)  SIL optimization
  98.1063 ( 11.6%)   0.6438 ( 19.9%)  98.7501 ( 11.6%)  99.1275 ( 11.6%)  LLVM optimization
  70.1294 (  8.3%)   0.1772 (  5.5%)  70.3066 (  8.3%)  70.7880 (  8.3%)  Type checking / Semantic analysis
  45.8265 (  5.4%)   0.2249 (  7.0%)  46.0514 (  5.4%)  46.2519 (  5.4%)  LLVM output
  37.0902 (  4.4%)   0.3492 ( 10.8%)  37.4394 (  4.4%)  37.4759 (  4.4%)  IRGen
  15.6720 (  1.9%)   0.0720 (  2.2%)  15.7440 (  1.9%)  15.7567 (  1.9%)  SILGen
   9.3491 (  1.1%)   0.0151 (  0.5%)   9.3641 (  1.1%)   9.3649 (  1.1%)  SIL verification (post-optimization)
   7.0676 (  0.8%)   0.0141 (  0.4%)   7.0816 (  0.8%)   7.0843 (  0.8%)  AST verification
   5.3563 (  0.6%)   0.0148 (  0.5%)   5.3711 (  0.6%)   5.3715 (  0.6%)  SIL verification (pre-optimization)
   0.5715 (  0.1%)   0.0072 (  0.2%)   0.5786 (  0.1%)   0.5885 (  0.1%)  Serialization (swiftmodule)
   0.3793 (  0.0%)   0.0062 (  0.2%)   0.3854 (  0.0%)   0.3992 (  0.0%)  Parsing
   0.0530 (  0.0%)   0.0003 (  0.0%)   0.0533 (  0.0%)   0.0533 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  846.3343 (100.0%)   3.2298 (100.0%)  849.5641 (100.0%)  851.0508 (100.0%)  Total

Now, StdlibUnittest is a bit large and unusual in several metrics, but still.

Admittedly, that was a debug compiler, with LLVM compiled as RelWithDebInfo and swiftc compiled as Debug. Under a release build (without assertions), the numbers are a lot better...

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 401.3502 seconds (401.6033 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  162.3543 ( 41.0%)   1.6402 ( 31.8%)  163.9946 ( 40.9%)  164.0571 ( 40.9%)  LLVM optimization
  119.7621 ( 30.2%)   2.5938 ( 50.3%)  122.3559 ( 30.5%)  122.3938 ( 30.5%)  SIL optimization
  83.9661 ( 21.2%)   0.5395 ( 10.5%)  84.5056 ( 21.1%)  84.6360 ( 21.1%)  LLVM output
  15.4938 (  3.9%)   0.2337 (  4.5%)  15.7275 (  3.9%)  15.7461 (  3.9%)  IRGen
   7.7852 (  2.0%)   0.0487 (  0.9%)   7.8339 (  2.0%)   7.8356 (  2.0%)  Type checking / Semantic analysis
   3.2124 (  0.8%)   0.0204 (  0.4%)   3.2328 (  0.8%)   3.2342 (  0.8%)  SIL verification (post-optimization)
   1.8400 (  0.5%)   0.0497 (  1.0%)   1.8897 (  0.5%)   1.8899 (  0.5%)  SILGen
   0.8468 (  0.2%)   0.0016 (  0.0%)   0.8484 (  0.2%)   0.8486 (  0.2%)  AST verification
   0.6459 (  0.2%)   0.0106 (  0.2%)   0.6564 (  0.2%)   0.6567 (  0.2%)  SIL verification (pre-optimization)
   0.2309 (  0.1%)   0.0133 (  0.3%)   0.2442 (  0.1%)   0.2443 (  0.1%)  Serialization (swiftmodule)
   0.0411 (  0.0%)   0.0013 (  0.0%)   0.0424 (  0.0%)   0.0424 (  0.0%)  Parsing
   0.0185 (  0.0%)   0.0003 (  0.0%)   0.0188 (  0.0%)   0.0188 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  396.1972 (100.0%)   5.1531 (100.0%)  401.3502 (100.0%)  401.6033 (100.0%)  Total

...so I wonder if we have some really expensive assertions turned on.

@belkadan
Copy link
Contributor Author

cc @swiftix

@gottesmm
Copy link
Member

Is this with optimization?

EDIT: Nm I read the full thing.

@belkadan
Copy link
Contributor Author

Added more configuration information. Now trying a Release+Asserts build of all components.

@belkadan
Copy link
Contributor Author

Okay, a Release+Asserts build looks more like the Release build than the Debug build. What's going on here?

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 224.8902 seconds (225.0464 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  92.1323 ( 41.4%)   0.6140 ( 24.7%)  92.7464 ( 41.2%)  92.7973 ( 41.2%)  LLVM optimization
  60.6715 ( 27.3%)   1.2473 ( 50.1%)  61.9189 ( 27.5%)  61.9402 ( 27.5%)  SIL optimization
  45.0392 ( 20.3%)   0.2393 (  9.6%)  45.2784 ( 20.1%)  45.3518 ( 20.2%)  LLVM output
  12.3687 (  5.6%)   0.2708 ( 10.9%)  12.6394 (  5.6%)  12.6487 (  5.6%)  IRGen
   7.7395 (  3.5%)   0.0344 (  1.4%)   7.7739 (  3.5%)   7.7746 (  3.5%)  Type checking / Semantic analysis
   1.8397 (  0.8%)   0.0533 (  2.1%)   1.8930 (  0.8%)   1.8932 (  0.8%)  SILGen
   1.0223 (  0.5%)   0.0086 (  0.3%)   1.0309 (  0.5%)   1.0311 (  0.5%)  SIL verification (post-optimization)
   0.8511 (  0.4%)   0.0006 (  0.0%)   0.8517 (  0.4%)   0.8518 (  0.4%)  AST verification
   0.5987 (  0.3%)   0.0097 (  0.4%)   0.6084 (  0.3%)   0.6084 (  0.3%)  SIL verification (pre-optimization)
   0.0786 (  0.0%)   0.0077 (  0.3%)   0.0863 (  0.0%)   0.0863 (  0.0%)  Serialization (swiftmodule)
   0.0424 (  0.0%)   0.0020 (  0.1%)   0.0444 (  0.0%)   0.0444 (  0.0%)  Parsing
   0.0184 (  0.0%)   0.0002 (  0.0%)   0.0186 (  0.0%)   0.0186 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  222.4024 (100.0%)   2.4878 (100.0%)  224.8902 (100.0%)  225.0464 (100.0%)  Total

@swift-ci
Copy link
Collaborator

Comment by Xin Tong (JIRA)

Can we get a breakdown of the optimization passes responsible in DEBUG build? Roman told me ~2 weeks ago that DSE and RLE are taking quite a bit of time in StdlibUnitTest and I have not got the time to optimize the 2 passes for the it. (I did optimize the 2 passes for the stdlib quite a bit).

I will be getting to the task soon though.

@belkadan
Copy link
Contributor Author

I accidentally hung Instruments, but I'll run the experiment again tomorrow. I think DSE was pretty high up there, along with ARC opts.

@swift-ci
Copy link
Collaborator

Comment by Xin Tong (JIRA)

I have committed something (4491d86 ) that will reduce the compilation time of DSE on the stdlibunittest quite a bit. I have observed DSE drops from 1.8% to 1.4% of the overall compilation time on compiling stdlib -O.

@belkadan
Copy link
Contributor Author

Thanks, Xin! I'll make sure to grab your changes before retesting.

@belkadan
Copy link
Contributor Author

Xin's change helped a lot. Here are the passes that take the most time in a Debug build now:

Running Time            Self (ms)          Symbol Name
85674.0ms   11.4%   9.0            (anonymous namespace)::ARCSequenceOpts::run()
61178.0ms    8.2%   4.0            (anonymous namespace)::SILCombine::run()
52346.0ms    7.0%   5.0            (anonymous namespace)::RedundantLoadElimination::run()
50140.0ms    6.7%   1.0            (anonymous namespace)::DeadStoreElimination::run()
36925.0ms    4.9%   11.0               (anonymous namespace)::SILCSE::run()
32228.0ms    4.3%   6.0            (anonymous namespace)::SimplifyCFGPass::run()

Nearly all of the ARC time is spent in LoopARCPairingContext::runOnLoop. In SILCombine, BuiltinInst::getBuiltinInfo() takes a surprising amount of time (because of the DenseMap), and cacheDebugLoc() isn't cheap either. Alias analysis slows down RLE and DSE with unoptimized DenseMaps. SILCSE uses ScopedHashTableScopes, which are expensive. And SimplifyCFG spends time calculating dominance.

Basically, DenseMaps are silly-expensive in Debug modes, likely because none of the DenseMapInfo is getting inlined. I wish we could say "okay, optimize these functions" even in a "Debug" build.

@swift-ci
Copy link
Collaborator

Comment by Xin Tong (JIRA)

what is the total amount of time spent in optimizer now ?

@belkadan
Copy link
Contributor Author

It's still a good 7 minutes in debug builds. My LLVM is Release+Asserts, though, so it's not exactly a fair comparison.

@gottesmm
Copy link
Member

Jordan, part of the problem in ARC is that we are visiting the CFG too many times. I am going to be fixing this in the spring.

With respect to the DenseMap issue, just like we use Release+Asserts LLVM, perhaps it would make sense to try having a special library that is compiled in release+asserts mode that has explicit template instantiations of the various DenseMaps that we use. Then everything else will use external templates of the DenseMaps.

I imagine many of these cases involve data types that are trivially copyable (i.e. pointers, ints, etc).

@swift-ci
Copy link
Collaborator

Comment by Xin Tong (JIRA)

And for what it is worth. This is where we currently stand for compiling stdlibunit test -O with a release with assertion for llvm and swift compiler.

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
Running Time    Self (ms)               Symbol Name
23346.0ms   21.6%       53.0                    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
6801.0ms    6.3%        8.0                      (anonymous namespace)::ARCSequenceOpts::run()
3293.0ms    3.0%        81.0                     (anonymous namespace)::DeadStoreElimination::run()
3037.0ms    2.8%        8.0                      (anonymous namespace)::SILCombine::run()
2521.0ms    2.3%        2.0                      (anonymous namespace)::SimplifyCFGPass::run()
2250.0ms    2.0%        449.0                    (anonymous namespace)::RedundantLoadElimination::run()
1972.0ms    1.8%        110.0                    (anonymous namespace)::SILCSE::run()
755.0ms    0.7% 177.0                             (anonymous namespace)::DCE::run()
474.0ms    0.4% 6.0                                 (anonymous namespace)::SILCodeMotion::run()
414.0ms    0.3% 0.0                                 (anonymous namespace)::StackPromotion::run()
255.0ms    0.2% 6.0                                 (anonymous namespace)::ABCOpt::run()
233.0ms    0.2% 1.0                                 (anonymous namespace)::ConstantPropagation::run()
199.0ms    0.1% 0.0                                 (anonymous namespace)::SILMem2Reg::run()
183.0ms    0.1% 63.0                               (anonymous namespace)::SILLowerAggregate::run()
155.0ms    0.1% 38.0                               (anonymous namespace)::SILSROA::run()
114.0ms    0.1% 14.0                               (anonymous namespace)::LICM::run()

@swift-ci
Copy link
Collaborator

Comment by Xin Tong (JIRA)

Mark, I know our compilation time is much better now. Can you please post the numbers and close this SR.

@rudkx
Copy link
Member

rudkx commented Apr 21, 2016

Compilation time is generally better now, but we're still much slower in the debug build than release build.

I don't think this is due to any expensive checks, but rather that the unoptimized compiler is just much much slower.

I spent some time this morning building compiler variants and timing them while building StdlibUnitTest. Here's what I see now for the SIL optimizer time.

In each case LLVM/clang is release, no-asserts. For swift, we have

  ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.6017 ( 32.5%)   0.1291 ( 70.5%)   5.7308 ( 32.9%)   5.7346 ( 32.9%)  SIL optimization - RELEASE
   6.4858 ( 34.5%)   0.1490 ( 62.6%)   6.6349 ( 34.8%)   6.6467 ( 34.5%)  SIL optimization - RELEASE+ASSERT
  55.7352 ( 66.8%)   0.1773 ( 66.7%)  55.9124 ( 66.8%)  55.9345 ( 66.8%)  SIL optimization - DEBUG+ASSERT

So the DEBUG+ASSERT time is around 8.5x what the RELEASE+ASSERT time is.

Note that I believe StdlibUnitTest was broken up into pieces since this was opened, so this is very different than the situation we were in four months ago.

In any event, I'm not sure that there's really anything actionable here. We could consider various build changes to ease the pain of some developers (e.g. compile the SIL optimizer with optimizations enabled in some configuration so that the people who don't need to debug it aren't spending time waiting for it to execute during builds).

Since I'm not one of the people who would use that mode, I'm not especially inclined to add more complexity to our build system, though.

@rudkx
Copy link
Member

rudkx commented Sep 22, 2016

Bob - not sure if this is worth keeping open or not, but it probably shouldn't be on me.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself performance
Projects
None yet
Development

No branches or pull requests

4 participants