Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-15166] Crash in _dispatch_wait_for_enqueuer on Android armeabi-v7a #603

Closed
triplef opened this issue Sep 7, 2021 · 5 comments
Closed

Comments

@triplef
Copy link
Contributor

triplef commented Sep 7, 2021

Previous ID SR-15166
Radar None
Original Reporter @triplef
Type Bug
Status Resolved
Resolution Done

Attachment: Download

Environment
  • Android 9 and 10

  • ABI: armeabi-v7a/NEON

  • Various devices, mostly using the Cortex-A53 CPU

Additional Detail from JIRA
Votes 0
Component/s libdispatch
Labels Bug
Assignee None
Priority Medium

md5: 4bd80c1ec0b98960f2ded3ecd9341aa4

Issue Description:

We’re seeing the following crash in libdispatch _dispatch_wait_for_enqueuer() on Android armeabi-v7a due to __builtin_arm_wfe() causing a SIGILL:

Exception Type: Unknown (SIGILL)

Application Specific Information:
IllegalInstruction

Thread 0 Crashed:
0 libdispatch.so  +0x0027374  _dispatch_wait_for_enqueuer (yield.c:47)
1 libdispatch.so  +0x001e7d2  [inlined] _dispatch_main_queue_drain (queue.c:6797)
2 libdispatch.so  +0x001e7d2  _dispatch_main_queue_callback_4CF (queue.c:6960)
... (application-specific runloop)

Following is the disassembled library around where it crashes at 0x0027374. It looks like the compiler unrolled the loop, and only the second WFE command seems to crash.

             _dispatch_wait_for_enqueuer:
0002735c         ldrex      r1, [r0]            ; DATA XREF=dword_129c4
00027360         cbz        r1, loc_2736a

             loc_27362:
00027362         mov        r0, r1              ; CODE XREF=_dispatch_wait_for_enqueuer+22,
00027364         clrex
00027368         bx         lr
                        ; endp

             loc_2736a:
0002736a         wfe                            ; CODE XREF=_dispatch_wait_for_enqueuer+4
0002736c         ldrexhs    r1, [r0]            ; DATA XREF=dword_129bc
00027370         cmp        r1, #​0x0
00027372         bne        loc_27362

00027374         wfe                            <<<<<< !!!!!!!!! CRASH !!!!!!!!!
00027376         ldrexhs    r1, [r0]
0002737a         cmp        r1, #&#8203;0x0
0002737c         bne        loc_27362

0002737e         wfe
00027380         ldrexhs    r1, [r0]
00027384         cmp        r1, #&#8203;0x0
00027386         bne        loc_27362

00027388         wfe
0002738a         ldrexhs    r1, [r0]
0002738e         cmp        r1, #&#8203;0x0
00027390         bne        loc_27362

00027392         wfe
00027394         ldrexhs    r1, [r0]
00027398         cmp        r1, #&#8203;0x0
0002739a         bne        loc_27362

0002739c         wfe
0002739e         ldrexhs    r1, [r0]
000273a2         cmp        r1, #&#8203;0x0
000273a4         bne        loc_27362

000273a6         wfe
000273a8         ldrexhs    r1, [r0]
000273ac         cmp        r1, #&#8203;0x0
000273ae         bne        loc_27362

000273b0         wfe
000273b2         ldrexhs    r1, [r0]
000273b6         cmp        r1, #&#8203;0x0
000273b8         bne        loc_27362

000273ba         wfe
000273bc         ldrexhs    r1, [r0]
000273c0         cmp        r1, #&#8203;0x0
000273c2         bne        loc_27362
@typesanitizer
Copy link

cc @compnerd

@triplef
Copy link
Contributor Author

triplef commented Oct 14, 2021

@buttaface I saw that you’ve been doing some work with libdispatch on Android – any thoughts on this? This is our most frequent crash on Android, but we’re unsure what to do here.

@finagolfin
Copy link
Contributor

I've only been adding build tweaks to keep it running so I'm not familiar with how libdispatch works internally, nor have I ever heard of this wfe instruction. A search turned up this github issue where rocksdb switched from wfe to yield, which performed much better. You could try the same by replacing __builtin_arm_wfe with dispatch_hardware_pause(), as can be seen for other arches later in that function, or simply comment that builtin out, and rebuild libdispatch to see if that helps.

@triplef
Copy link
Contributor Author

triplef commented Dec 10, 2021

Thanks for the suggestion @buttaface! We replaced __builtin_arm_wfe() with dispatch_hardware_pause() and we have been observing no more crashes in production.

I opened a pull request with the change:
#590

@triplef
Copy link
Contributor Author

triplef commented Feb 21, 2022

Resolved via:
#590

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@shahmishal shahmishal transferred this issue from apple/swift May 5, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants