Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-375] Counting emoji that recently added to iOS and OS X. #42992

Closed
norio-nomura opened this issue Dec 25, 2015 · 7 comments
Closed

[SR-375] Counting emoji that recently added to iOS and OS X. #42992

norio-nomura opened this issue Dec 25, 2015 · 7 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. not a bug Resolution → not a bug: Reported as a bug but turned out to be expected behavior or programmer error standard library Area: Standard library umbrella

Comments

@norio-nomura
Copy link
Contributor

Previous ID SR-375
Radar None
Original Reporter @norio-nomura
Type Bug
Status Closed
Resolution Won't Do

Attachment: Download

Environment

OS X 10.11.2 (15C50)
Apple Swift version 2.2-dev (LLVM 3ebdbb2c7e, Clang f66c5bb67b, Swift 17fe37d)
Target: x86_64-apple-macosx10.9

Additional Detail from JIRA
Votes 0
Component/s Standard Library
Labels Bug, Runtime
Assignee None
Priority Medium

md5: 2cba750f1160e4f7e09133b0a53d5513

Issue Description:

See screenshot for viewing rendered emoji.

// Does following results are expected?

"\u{0001F468}\u{200D}\u{0001F469}\u{200D}\u{0001F467}\u{200D}\u{0001F467}".characters.count // 4
"\u{0001F44D}\u{0001F3FB}".characters.count // 2

What's the difference to following:

"\u{0001F1EF}\u{0001F1F5}".characters.count // 1
@lilyball
Copy link
Mannequin

lilyball mannequin commented Jan 3, 2016

I'm pretty sure this is correct.

String.characters splits the string into characters based on Unicode Standard Annex #29, "Unicode Text Segmentation". This document defines the rules for how to segment a sequence of unicode scalars into grapheme clusters. These rules include explicit handling for the Regional Indicator symbols, which covers your "\u{0001F1EF}\u{0001F1F5}" string (according to the rules, there cannot be a grapheme cluster break between two regional indicator symbols; if you have several flags in a row you're supposed to put something like U+200B ZERO WIDTH SPACE in between them to allow a grapheme cluster break). But they do not cover extended emoji modifiers/sequences. UAX #29 does allow for tailoring the rules (e.g. for specific locales) but it is entirely appropriate for Swift to be using the default rules without any tailoring.

UTR #51: Unicode Emoji defines how emoji modifiers and emoji sequences work. An important takeaway from this definition is that systems that do not have a glyph to represent a given sequence may fallback to showing each component emoji separately, which is likely why UAX #29 definition was not changed to include emoji modifiers / sequences. Although it does say that editors should attempt to treat an emoji sequence as a single grapheme cluster (e.g. hitting delete at the end should delete the whole sequence instead of just one character).

Incidentally, String.rangeOfComposedCharacterSequenceAtIndex actually splits based on how an editor would do it instead of based on UAX #29 (although this behavior is actually inherited from NSString), as does enumerateSubstringsInRange when using NSStringEnumerationOptions.ByComposedCharacterSequences. The documentation doesn't actually specify the rules it uses for that, though the String Programming Guide does reference UAX #29. It seems plausible that NSString's implementation of composed character sequence splitting is basically UAX #29 tailored with additional rules for splitting between adjacent regional indicator pairs and for combining certain emoji sequences.

@norio-nomura
Copy link
Contributor Author

Thank you for explanation.

Now I understand that counting emoji with Fitzpatrick and multi-person groupings should be changed by platforms.
Also I found that CoreFoundation has some codes for counting them in CFString.

It would be helpful if Strings and Characters of The Swift Programming Launguage or Using Swift with Cocoa and Objective-C would contains some note that there are the differences between the OS X/iOS platform depended feature and the standard depended feature Swift referring.

@lilyball
Copy link
Mannequin

lilyball mannequin commented Jan 4, 2016

Now I understand that counting emoji with Fitzpatrick and multi-person groupings should be changed by platforms.

What? No, that's backwards. The behavior shouldn't change for each platform. That's why UAX #29 doesn't include emoji modifiers, because each implementation is free to decide whether to show the emoji modifier sequence as a single icon or as multiple icons. The fact that NSString does treat an emoji modifier sequence as a single "character cluster" doesn't really mean anything at all for Swift, especially because NSString doesn't actually document the rules it uses anyway.

[…] there are the differences between the OS X/iOS platform depended feature and the standard depended feature Swift referring.

I have no idea how to parse that sentence

@norio-nomura
Copy link
Contributor Author

Sorry my bad english. 🙁

I meant that counting them should be changed by platforms in other layer than Swift Standard Library.

@norio-nomura
Copy link
Contributor Author

So, this is not a bug and I close.

@norio-nomura
Copy link
Contributor Author

This is not a bug.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
@thailocnexle
Copy link

why this(☹️) icon is not correct?

@AnthonyLatsis AnthonyLatsis added not a bug Resolution → not a bug: Reported as a bug but turned out to be expected behavior or programmer error bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. and removed runtime The Swift Runtime bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. labels Nov 8, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. not a bug Resolution → not a bug: Reported as a bug but turned out to be expected behavior or programmer error standard library Area: Standard library umbrella
Projects
None yet
Development

No branches or pull requests

3 participants