[SR-5992] String.Index.init?(_:within:) sometimes succeeds even if not grapheme-aligned #48549
Labels
bug
A deviation from expected or documented behavior. Also: expected but undesirable behavior.
standard library
Area: Standard library umbrella
Environment
Swift 4.0, Xcode 9.0, macOS 10.13 GM (17A362a)
Additional Detail from JIRA
md5: 16ba556673580da05fa7c50fda72c6fb
Issue Description:
The documentation for
String.Index.init?(_ sourcePosition: String.Index, within target: String)
says:But the initializer also wrongly succeeds sometimes when passed an index (say, from the UTF-16 or UTF-8 view) that is not aligned with the start of the grapheme cluster and when the input string is an emoji ZWJ sequence.
Example:
This prints in Swift 4.0 on macOS 10.13:
I'd expect only offset 0 to return a valid
String.Index
. All offsets from 1 to 10 should result innil
. The problem seems to be that the initializer only checks if the passed-in index constitutes a valid start of a grapheme cluster; it doesn't backtrack to check if there is a valid grapheme cluster boundary between the index position and the code point before it.In this example, the indices at offsets 3, 6 and 9 should see the respective ZWJ code points that precede them and then return
nil
because UAX #29 specifies includes the rule "Do not break before extending characters or ZWJ."Unfortunately, doing this correctly has an impact on performance.
Related links: This report was triggered by a Stack Overflow question and this Twitter discussion.
Note: an answer to the Stack Overflow question points out that the Foundation method
rangeOfComposedCharacterSequence(at: )
has the correct behavior and can be used as a workaround.The text was updated successfully, but these errors were encountered: