Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-6920] Unicode combining scalars interact wrongly with the start of a string literal. #49469

Open
Dante-Broggi opened this issue Feb 4, 2018 · 3 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself parser Area → compiler: The legacy C++ parser

Comments

@Dante-Broggi
Copy link
Contributor

Previous ID SR-6920
Radar None
Original Reporter @Dante-Broggi
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, Parser
Assignee None
Priority Medium

md5: 493f3295b28807d82094414006ece57b

Issue Description:

This code should not compile, because these string literals should not have beginning quote characters:

```swift

let quote = "̠"

```

`{LATIN SMALL LETTER L}{LATIN SMALL LETTER E}{LATIN SMALL LETTER T}{SPACE}{LATIN SMALL LETTER Q}{LATIN SMALL LETTER U}{LATIN SMALL LETTER O}{LATIN SMALL LETTER T}{LATIN SMALL LETTER E}{SPACE}{EQUALS SIGN}{SPACE}{QUOTATION MARK}{COMBINING MINUS SIGN BELOW}{QUOTATION MARK}

`

To verify this, let us ask what swift strings think:

```swift

let quoteCount = """

let quote = "̠"

""".filter { return $0 == """ }.count

```

The above returns that there is 1 quote, meaning it cannot be a valid string literal, yet it is.

@belkadan
Copy link
Contributor

belkadan commented Feb 5, 2018

The Swift parser doesn't currently take combining characters into account.

@belkadan
Copy link
Contributor

belkadan commented Feb 5, 2018

@akyrtzi, @rintaro, what do you think we should do here? I feel like we should probably just close this. Handling it properly would mean consulting a Unicode table in the OS, which would slow down lexing.

@rintaro
Copy link
Mannequin

rintaro mannequin commented Feb 21, 2018

Disclaimer:
We have to measure the performance impact before we adopt this.
And the priority is probably low.


We already have https://github.com/apple/swift/blob/master/include/swift/Basic/Unicode.h
At least, we can try:

#include "swift/Basic/Unicode.h"

using swift::unicode;

/// Returns true if a quote followed by 'scalarAfterQuote' can be used as a quote.
bool isValidQuote(uint32_t scalarAfterQuote) {
  if (scalarAfterQuote < 0x80)
    return true;

  auto GCBForQuoteMark = getGraphemeClusterBreakProperty('"');
  auto GCBForFirstScalar = getGraphemeClusterBreakProperty(scalarAfterQuote);
  // If 'scalarAfterQuote' is a EGC boundary, it's a valid quote.
  return isExtendedGraphemeClusterBoundary(GCBForQuoteMark, GCBForFirstScalar);
}

// Check isValidQuote() for each quotes in string literal (and start quote)

If we implement this, the following code should be valid.
let a = "foo "̠ bar"

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself parser Area → compiler: The legacy C++ parser
Projects
None yet
Development

No branches or pull requests

2 participants