Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-6077] Character ends up accepting invalid emoji flag sequences. #48632

Open
YOCKOW opened this issue Oct 6, 2017 · 5 comments
Open

[SR-6077] Character ends up accepting invalid emoji flag sequences. #48632

YOCKOW opened this issue Oct 6, 2017 · 5 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Character Area → standard library: The `Character` type literals Feature → expressions: Literals such as an integer or string literal standard library Area: Standard library umbrella String Area → standard library: The `String` type unexpected behavior Bug: Unexpected behavior or incorrect output

Comments

@YOCKOW
Copy link
Collaborator

YOCKOW commented Oct 6, 2017

Previous ID SR-6077
Radar None
Original Reporter @YOCKOW
Type Bug
Status Resolved
Resolution Won't Do
Environment
  • Swift 4.0

  • OS: macOS, Ubuntu 16.04

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug
Assignee None
Priority Medium

md5: c82e0168fd3a0f487ee734ba7518dfc4

relates to:

  • SR-9206 Should literal Characters precondition on single-grapheme check?

Issue Description:

The Swift code below will be compiled without any errors, although it does not conform to Unicode segmentation algorithm.

let a: Character = "\u{1F1E6}" // REGIONAL INDICATOR SYMBOL LETTER A
let abc: Character = "\u{1F1E6}\u{1F1E7}\u{1F1E8}" //REGIONAL INDICATOR SYMBOL LETTER A-C

Expected Result
An error is raised such as error: cannot convert value of type 'String' to specified type 'Character'.

References

  • UAX #29
    > Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.
    > GB12 sot (RI RI)* RI × RI
    > GB13 RI (RI RI)* RI × RI

  • UTS #51
    > emoji flag sequence — A sequence of two Regional Indicator characters, where the corresponding ASCII characters are valid region sequences
    > A singleton Regional Indicator character is called an ill-formed emoji flag sequence.

@belkadan
Copy link
Contributor

belkadan commented Oct 6, 2017

I suspect we're not going to do this simply because the Unicode standard changes and we wouldn't want existing code to stop compiling. This would have been invalid for a different reason in Unicode 9.

cc @airspeedswift

@airspeedswift
Copy link
Member

Right, we generally think the attempt to compile-time-validate graphemes previously to be a bit of a misadventure that leads to odd behavior when the compiler and OS differ. So we now only do it on a best-efforts basis (similar to how we catch some literal overflow issues but not others).

@YOCKOW
Copy link
Collaborator Author

YOCKOW commented Oct 7, 2017

Thank you for your responses.
I was just concerned about the difference from the failure that let rnr: Character = "\r\n\r" could not be compiled.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@lorentey lorentey reopened this Feb 23, 2023
@lorentey
Copy link
Member

lorentey commented Feb 23, 2023

let a: Character = "\u{1F1E6}"

This one is OK. "\u{1F1E6}" is a string of count 1. (Ill-formed emojis still have well-defined grapheme breaks around them.)

let abc: Character = "\u{1F1E6}\u{1F1E7}\u{1F1E8}"

This one is most definitely not okay. "\u{1F1E6}\u{1F1E7}\u{1F1E8}" is a string of count 2, so it cannot be considered a Character.

I suspect we're not going to do this simply because the Unicode standard changes and we wouldn't want existing code to stop compiling.

Character has the invariant that it must not contain a grapheme break. This invariant must not be violated.

@lorentey
Copy link
Member

lorentey commented Feb 23, 2023

Evidently there is currently no limit how many Characters a Character may contain.

let lovely: Character = "\u{1F1E6}\u{1F1E7}\u{1F1E8}\u{1F1E9}\u{1F1EA}\u{1F1E6}\u{1F1E7}\u{1F1E8}\u{1F1E9}\u{1F1EA}\u{1F1E6}\u{1F1E7}\u{1F1E8}\u{1F1E9}\u{1F1EA}\u{1F1E6}\u{1F1E7}\u{1F1E8}\u{1F1E9}\u{1F1EA}\u{1F1E6}\u{1F1E7}\u{1F1E8}\u{1F1E9}\u{1F1EA}"
print(String(lovely).count) // 13

@AnthonyLatsis AnthonyLatsis added String Area → standard library: The `String` type literals Feature → expressions: Literals such as an integer or string literal Character Area → standard library: The `Character` type unexpected behavior Bug: Unexpected behavior or incorrect output Unicode Area → standard library: Unicode processing APIs and removed Unicode Area → standard library: Unicode processing APIs labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. Character Area → standard library: The `Character` type literals Feature → expressions: Literals such as an integer or string literal standard library Area: Standard library umbrella String Area → standard library: The `String` type unexpected behavior Bug: Unexpected behavior or incorrect output
Projects
None yet
Development

No branches or pull requests

5 participants