Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-10124] String.UTF16View.distance crashes on using misaligned String.Index with native String #52526

Closed
norio-nomura opened this issue Mar 17, 2019 · 8 comments
Assignees
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella

Comments

@norio-nomura
Copy link
Contributor

Previous ID SR-10124
Radar None
Original Reporter @norio-nomura
Type Bug
Status Resolved
Resolution Done
Environment

Xcode 10.2 beta 4
swift-5.0-DEVELOPMENT-SNAPSHOT-2019-03-10-a

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug
Assignee @milseman
Priority Medium

md5: cdb7f07a9521853bd56a4e8007b5404c

Sub-Tasks:

  • SR-10451 Slice contents before asking ICU for grapheme breaking
  • SR-10452 Scalar align all the indices
  • SR-10453 Scalar alignment fast-path

Issue Description:

It does not crash with NSString based String.

Reproducing code:

import Foundation

func useMisalignedRange(in string: String) {
    let utf8Location = 0
    let utf8Length = 7
    let start = dump(string.utf8.index(string.utf8.startIndex, offsetBy: utf8Location))
    let end = dump(string.utf8.index(start, offsetBy: utf8Length)) // misaligned
    print(string[start..<end])
    _ = dump(string.utf16.distance(from: string.utf16.startIndex, to: start))
    _ = dump(string.utf16.distance(from: start, to: end)) // crashes with native `String` on Swift 5
}

print("-- Use NSString")
let nsstring: NSString = "один"
useMisalignedRange(in: nsstring as String)
print("-- Use String")
let string = "один"
useMisalignedRange(in: string)

log:

$ pbpaste|swift -
-- Use NSStringSwift.String.Index
  - _rawBits: 0Swift.String.Index
  - _rawBits: 212992
оди?
- 0
- 3
-- Use StringSwift.String.Index
  - _rawBits: 0Swift.String.Index
  - _rawBits: 458752
оди?
- 0
Fatal error: 
Stack dump:
0.  Program arguments: /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift -frontend -interpret - -enable-objc-interop -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -color-diagnostics -module-name main 
0  swift                    0x000000011395feb3 PrintStackTraceSignalHandler(void*) + 51
1  swift                    0x000000011395f68c SignalHandler(int) + 348
2  libsystem_platform.dylib 0x00007fff667cfb5d _sigtramp + 29
3  libsystem_platform.dylib 0x6469752e72657375 _sigtramp + 199784501
4  libswiftCore.dylib       0x00007fff65f6eff1 $sSKsE9_distance4from2toSi5IndexQz_AEtFSS9UTF16ViewV_Tg5Tf4nnx_n + 1089
5  libswiftCore.dylib       0x00007fff65f6f5cf $sSS9UTF16ViewV16_nativeGetOffset3forSiSS5IndexV_tFTf4nx_n + 143
6  libswiftCore.dylib       0x00007fff65eda057 $sSS9UTF16ViewV8distance4from2toSiSS5IndexV_AGtF + 55
7  libswiftCore.dylib       0x00000001190286bc $sSS9UTF16ViewV8distance4from2toSiSS5IndexV_AGtF + 3004491420
8  libswiftCore.dylib       0x0000000119028215 $sSS9UTF16ViewV8distance4from2toSiSS5IndexV_AGtF + 3004490229
9  swift                    0x00000001101f758d llvm::MCJIT::runFunction(llvm::Function*, llvm::ArrayRef<llvm::GenericValue>) + 365
10 swift                    0x00000001101fd962 llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, char const* const*) + 1090
11 swift                    0x000000010f7c6b8a performCompile(swift::CompilerInstance&, swift::CompilerInvocation&, llvm::ArrayRef<char const*>, int&, swift::FrontendObserver*, swift::UnifiedStatsReporter*) + 58906
12 swift                    0x000000010f7b4c9e swift::performFrontend(llvm::ArrayRef<char const*>, char const*, void*, swift::FrontendObserver*) + 6862
13 swift                    0x000000010f752dae main + 1246
14 libdyld.dylib            0x00007fff665ea3d5 start + 1
[1]    21112 done                          pbpaste | 
       21113 illegal hardware instruction  swift -
@belkadan
Copy link
Contributor

cc @milseman

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 5, 2019

Prior to this crash in asserts configurations, we seem to be failing the invariant check that the Substring's indices are scalar-aligned. Given that we maintain the invariant elsewhere that Substrings are scalar-aligned, this is surprising. I'll look further.

edit: I am mistaken, scalar-alignment was done for the failable inits on String from a Substring's code unit views. Looking for the best way to get this fix out there.

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 5, 2019

Ok, sorry for the brain-fart, I originally thought this through but there is a bug in a fast-path for the String(Substring) initializer.

According to SE-0180:
When two indices being compared correspond to positions that are valid in any single String view, comparison semantics are already fully specified by the Collection requirements. The other cases occur when indices fall between Unicode scalar boundaries in views having distinct encodings... The proposed rule is that such indices are compared by comparing their encodedOffsets. Such index values are not totally ordered but do satisfy strict weak ordering requirements, which is sufficient for algorithms such as sort to exhibit sensible behavior.

This means that comparison involving sub-scalar indices might scalar align first, and so Substring has a offsetRange property for this purpose. However, that was being misused as part of a fast-path by String's initializer from Substring: we should be comparing the whole indices not the encoded offsets.

This is more fall-out from SE-0180. I don't think we should have allowed Substrings that fall on sub-scalar boundaries, we should at least rounded to the nearest scalar. This is a definite bug here, but I'll try and see if there's more from this example or elsewhere.

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 6, 2019

Interesting, fixing this crash means that one of our expected to crash tests no longer crashes.

My targeted fix involves scalar aligning UTF8 indices inside UTF16View's getNativeOffset for an index. I think we need another targeted fix that scalar aligns indices when forming a Substring, because there's a lot of wacky invariants that will be violated if we don't and follow SE-0180's reasoning. More updates to come

edit: JIRA doesn't support it if I try to put the code in here, probably because there's an emoji or something. Neither code nor noformat blocks work...

StringTraps.test("String.Index.utf16Offset(in:)/subscalarUTF8")

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 6, 2019

Here's another crasher, because contiguous substrings keep encoding validity assumptions: https://gist.github.com/milseman/f32a81878efd2977508bab3c6fea5e32

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 6, 2019

#23834

I am not looking forward to those benchmark results. Not overturning SE-0180 is my biggest ABI regret.

@milseman
Copy link
Mannequin

milseman mannequin commented Apr 10, 2019

Exhaustively testing this had uncovered a whole slew of related bugs and a need for more performance work. I'm using this as the parent bug for those.

@milseman
Copy link
Mannequin

milseman mannequin commented Jun 27, 2019

#23834

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

2 participants