Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-11936] String.split(separatedBy:) splits incorrectly for "\r" or "\n" if string contains "\r\n" #54355

Closed
swift-ci opened this issue Dec 10, 2019 · 3 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior.

Comments

@swift-ci
Copy link
Collaborator

Previous ID SR-11936
Radar rdar://problem/57824036
Original Reporter Krutsinger (JIRA User)
Type Bug
Status Closed
Resolution Invalid
Additional Detail from JIRA
Votes 0
Component/s
Labels Bug
Assignee None
Priority Medium

md5: f4e7dd9b7b0d01ebe35857f4997f78ac

Issue Description:

To demonstrate the issue, this code:

print("split using \\r: ", "test\r\n123\r\nhello\r\n".split(separator: "\r", omittingEmptySubsequences: true)) 

prints:

split using \r:  ["test\r\n123\r\nhello\r\n"]

but should print:

split using \r:  ["test", "\n123", "\nhello", "\n"] 

Similarly,

print("split using \\n: ", "test\r\n123\r\nhello\r\n".split(separator: "\n")

prints:

 split using \n:  ["test\r\n123\r\nhello\r\n"]  

but should print:

split using \n:  ["test\r", "123\r", "hello\r", ""] 

Easily reproduced in a playground.

@typesanitizer
Copy link

@swift-ci create

@swift-ci
Copy link
Collaborator Author

Comment by Kyle Macomber (JIRA)

This is because “\r\n” is a single grapheme cluster:

1> "\r\n"
$R0: String = "\r\n"
2> "\r\n".count
$R1: Int = 1

“\r” is a different single grapheme cluster:

3> "\r\n".contains("\r")
$R2: Bool = false

To get the behavior you’re looking for, use the `unicodeScalars` view on String which treats both ”\r” and “\n” as elements in the collection:

4> "\r\n".unicodeScalars.contains("\r")
$R3: Bool = true

@swift-ci
Copy link
Collaborator Author

Comment by Charles (JIRA)

Thank you for clarifying. The documentation for split() does not reference any exceptions for graphene clusters, so perhaps this is more of a documentation issue. That said, not trying to find out if contains, rather trying to split. I found that this accomplishes what I was expecting:

"abc\r\n123\r\nxyz".unicodeScalars.split(whereSeparator: {CharacterSet(charactersIn: "\r\n").contains($0)}).map(String.init)

Bottom line: if this behavior is as design, then close the issue. You may want to forward to whoever maintains the documentation.

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior.
Projects
None yet
Development

No branches or pull requests

2 participants