Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

Open
jtbandes opened this issue Aug 20, 2016 · 12 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself diagnostics QoI Bug: Diagnostics Quality of Implementation parser Area → compiler: The legacy C++ parser

Comments

@jtbandes
Copy link
Collaborator

Previous ID SR-2442
Radar rdar://problem/72776083
Original Reporter @jtbandes
Type Bug
Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, DiagnosticsQoI, Parser
Assignee None
Priority Medium

md5: 0eb677a1105f260bac228341f2e71d33

Issue Description:

Given var a = 1, b = 2, in some other languages you can swap them with

@modocache
Copy link
Mannequin

modocache mannequin commented Dec 10, 2016

I've confirmed that this is still an issue on the release version of Swift 3.0.1. Assuming it still happens on master, I'd like to try my hand at this. If I'm not done by January someone feel free to steal it back from me. 🙂

@modocache
Copy link
Mannequin

modocache mannequin commented Dec 14, 2016

Hello again! This is my first foray into the parser, and I wouldn't mind a little help. 🙂

I've found two spots where the parser could potentially recover, but I'd appreciate feedback on how I'm approaching the problem/solution.

Reducing the problem a bit, I noticed that swiftc -dump-parse - <<< "a, b" produces the same error mentioned in the task description. swiftc -dump-parse - <<< "(a, b)", on the other hand, is parsed successfully. So my approach here is:

  1. Identify invalid code like a, b

  2. Backtrack, emit diagnostics, and instead parse it as (a, b)

I'm not sure this approach makes sense, because I'm still learning how the parser maintains state in the first place.

To get into specifics, my understanding of the parsing occurring here is:

  1. parseTopLevelparseBraceItems → Parses top-level code in an loop.

  2. In the first pass of the loop, parseExprOrStmt calls parseExprPostfix, which in turn calls parseExprIdentifier. a is parsed as an identifier, the token is consumed, and the lexer primes the next token, ,.

  3. Because the previous expression a did not end with a semicolon, and because the current token , does not appear on a new line, a diagnostic is emitted: "Consecutive statements on a line must be separated by ';'".

  4. parseExprOrStmt calls parseExprPostfix. The current , token doesn't match any of the cases, so a diagnostic is output: "expected expression".

  5. Since the previous call to parseExprOrStmt resulted in an error, the parser attempts to skip tokens until the current brace is closed. Since our file contains no other braces, the parser reaches EOF and stops.

I can see two potential spots to add better diagnostics:

  1. In step (2) above, after parseExprIdentifier is called, the parser checks whether the identifier that was just parsed is followed by a ( token. Similarly, I could add some code to check whether it is followed by a , token. If it is, I could "backtrack", to instead parse the identifier as the first element of a tuple.

  2. In step (4) above, I could add some code to parseExprPostfix that handles a , token. I could then check whether the last token was an identifier, and if so "backtrack" to instead parse it as the first element of a tuple.

The part that I'm currently stuck on is: in both of these approaches, once I've noticed that I'm parsing what should be a tuple, how do I "backtrack?"

I noticed some code that calls Lexer::backtrackToState. My current intuition is to use this method. For example, in step (2) above, once I've parsed a and notice that it's immediately followed by ,, I could backtrack to the start location of a, "insert" a (, and begin parsing it as a tuple.

That's about as far as I've gotten so far. I have some questions:

  • Am I on the right track?

  • Is it possible to backtrack, then re-parse what I thought was an identifier as a tuple instead? Or is that not the correct approach?

  • Does it always make sense to parse <identifier>, <identifier> as (<identifier>, <identifier>)? Or should I be concentrating on the problem as stated, a, b = b, a?

@modocache
Copy link
Mannequin

modocache mannequin commented Dec 19, 2016

Friendly ping! /cc @jtbandes

As someone new to the parser, I'm not entirely sure if this is achievable. Specifically, it seems hard to determine where to place (). For example, take the following:

a, b, c = c, a, b

To a human being, it seems obvious that this should be (a, b, c) = (c, a, b). But from the parser's perspective, (a, b, c = c, a, b) is also valid. Looking through the parsing code, I didn't find any straightforward way to, after seeing a,, start parsing this as (a,, and then intuit that the parentheses should be placed at (a, b, c), instead of (a, b),, or (a, b c = c).

Then again, this is my first venture into the parsing code at all, so I absolutely expect to be missing something. Any advice, @jtbandes?

@modocache
Copy link
Mannequin

modocache mannequin commented Dec 20, 2016

Also +cc other parser aficionados, like @rintaro and CodaFi (JIRA User). Any tips, or even just a "this is possible to implement" stamp of approval?

@rintaro
Copy link
Mannequin

rintaro mannequin commented Dec 20, 2016

First of all, this is not a StarterBug 🙂
But, I do think this is achievable.

(a, b, c = c, a, b) is also valid.

Yeah, that is the most difficult problem in this issue.
In other languages that allow a, b = b, a constructs, , has higher precedence than =.
In Swift, from the parser's perspective, every operators including as or is has the same precedence, except for ,.
, is so special and has lowest precedence in the parser.

Because of this, the diagnostics would definitely be highly speculative,
but better than consecutive statements ... anyway 🙂

Also, we must take care of other constructs such as:

if <expr>, <expr> { ... }
let cb: (Int, Int) -> Void = { [x = <expr>, y = <expr>] a, b -> Void in ... }

I think the simplest solution would be modifying parseExprSequence.
In it, you can pretend that tok::comma has higher precedence than tok::equal
only if we are parsing a expression as a statement (you can use StructureMarkers).

Add case tok::comma in the main switch in parseExprSequence
Add some logic to case tok::equal.
Somehow, <expr1>, <expr2> = <expr3> + <expr4> - <expr5>, <expr6> ?? <expr7> should results:

(sequence_expr
  (tuple_expr implicit
    (<expr1>)  // ,
    (<expr2>))
  (assign_expr) // =
  (tuple_expr implicit
    (sequence_expr
      (<expr3>)
      (unresolved_decr_ref_expr name=+) // +
      (<expr4>)
      (unresolved_decr_ref_expr name=-) // -
      (<expr5>)) // ,
    (sequence_expr
      (<expr6>)
      (unresolved_decr_ref_expr name=??) // ??
      (<expr7>))))

You can emit error diagnostics with fix-it when constructing implicit TupleExpr.

I think, you can ignore "mutability" of LHS(s), or difference of num of elements between LHS and RHS.
Sema can diagnose them.

Disclaimer: I'm not 100% sure about this strategy 😛

@jtbandes
Copy link
Collaborator Author

You could also consider parseStmt as the entry point for diagnosing this issue. Looking around I also notice Token::isAtStartOfLine() which may or may not be helpful in deciding whether to consider the possibility of the missing-parens diagnosis. (BTW, I believe what it should produce is a TupleShuffleExpr.)

@jtbandes
Copy link
Collaborator Author

Ah, Rintaro is right, I guess the TupleShuffle is only produced during type checking. It's just a SequenceExpr during parsing.

@modocache
Copy link
Mannequin

modocache mannequin commented Dec 20, 2016

Awesome, thanks so much for the explanations. This definitely seems like more than I can accomplish in the next few weeks, so I'll put it back up for grabs. I'll try to build up my skills with other, smaller parser tasks first, then revisit this later.

I also removed the "StarterBug" label – although it was a very good prompt to explore a lot of the parser code, so: thank you! (And this definitely seems like a really great diagnostic to have, I'm excited to see someone take it on.)

@jtbandes
Copy link
Collaborator Author

Re: backtracking, you can also find some places where BacktrackingScope is used. It looks like that object also takes care of hiding diagnostics emitted while it's active (via DiagnosticTransaction). I can't make any claims as to what is the "best" approach for trying multiple parses like this, I would just recommend looking around to find other cases which do the same thing. There seems to be some precedent for separating diagnosis like this into separate functions, e.g. bool diagnoseFoo(...); I'm sure you can pick the appropriate level of abstraction as you work on it.

I think you're on the right track in general 🙂 I definitely had to struggle through skimming lots of parser/typechecker code when I began working on these sorts of bugs (and I still do, every time — it's such a large component). From your analysis it sounds like you've figured out the general workflow of tracing through where various expressions are first created / the decisions being made by the parser along the way.

Test cases will be interesting too. (a, b, c = c, a, b) being valid is kinda funny — I imagine with the right variable types, (a, b = c, d) = (a = d, c, b) would be perfectly valid code. To keep this simple you'd probably want to restrict it to when there's just one = in the whole sequence (or generally, when there's no ambiguity in where to split it).

@jtbandes
Copy link
Collaborator Author

You may have an easier time fixing bugs to start with, rather than adding "features" like new diagnostics 😉

@rintaro
Copy link
Mannequin

rintaro mannequin commented Dec 20, 2016

Even a = (b, c) = d can be valid.

var a: ()
var b: Int
var c: Int
let d: (Int, Int) = (12, 34)

a = (b, c) = d

Hmm, a = b, c = d doesn't look like a = (b, c) = d 🙁

@typesanitizer
Copy link

@swift-ci create

@swift-ci swift-ci transferred this issue from apple/swift-issues Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. compiler The Swift compiler in itself diagnostics QoI Bug: Diagnostics Quality of Implementation parser Area → compiler: The legacy C++ parser
Projects
None yet
Development

No branches or pull requests

2 participants