[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

jtbandes · 2016-08-20T21:44:14Z


Previous ID	SR-2442
Radar	rdar://problem/72776083
Original Reporter	@jtbandes
Type	Bug

Additional Detail from JIRA


Votes	0
Component/s	Compiler
Labels	Bug, DiagnosticsQoI, Parser
Assignee	None
Priority	Medium

md5: 0eb677a1105f260bac228341f2e71d33

Issue Description:

Given var a = 1, b = 2, in some other languages you can swap them with

The text was updated successfully, but these errors were encountered:

modocache · 2016-12-10T07:13:02Z

I've confirmed that this is still an issue on the release version of Swift 3.0.1. Assuming it still happens on master, I'd like to try my hand at this. If I'm not done by January someone feel free to steal it back from me. 🙂

modocache · 2016-12-14T09:31:26Z

Hello again! This is my first foray into the parser, and I wouldn't mind a little help. 🙂

I've found two spots where the parser could potentially recover, but I'd appreciate feedback on how I'm approaching the problem/solution.

Reducing the problem a bit, I noticed that swiftc -dump-parse - <<< "a, b" produces the same error mentioned in the task description. swiftc -dump-parse - <<< "(a, b)", on the other hand, is parsed successfully. So my approach here is:

Identify invalid code like a, b
Backtrack, emit diagnostics, and instead parse it as (a, b)

I'm not sure this approach makes sense, because I'm still learning how the parser maintains state in the first place.

To get into specifics, my understanding of the parsing occurring here is:

parseTopLevel → parseBraceItems → Parses top-level code in an loop.
In the first pass of the loop, parseExprOrStmt calls parseExprPostfix, which in turn calls parseExprIdentifier. a is parsed as an identifier, the token is consumed, and the lexer primes the next token, ,.
Because the previous expression a did not end with a semicolon, and because the current token , does not appear on a new line, a diagnostic is emitted: "Consecutive statements on a line must be separated by ';'".
parseExprOrStmt calls parseExprPostfix. The current , token doesn't match any of the cases, so a diagnostic is output: "expected expression".
Since the previous call to parseExprOrStmt resulted in an error, the parser attempts to skip tokens until the current brace is closed. Since our file contains no other braces, the parser reaches EOF and stops.

I can see two potential spots to add better diagnostics:

In step (2) above, after parseExprIdentifier is called, the parser checks whether the identifier that was just parsed is followed by a ( token. Similarly, I could add some code to check whether it is followed by a , token. If it is, I could "backtrack", to instead parse the identifier as the first element of a tuple.
In step (4) above, I could add some code to parseExprPostfix that handles a , token. I could then check whether the last token was an identifier, and if so "backtrack" to instead parse it as the first element of a tuple.

The part that I'm currently stuck on is: in both of these approaches, once I've noticed that I'm parsing what should be a tuple, how do I "backtrack?"

I noticed some code that calls Lexer::backtrackToState. My current intuition is to use this method. For example, in step (2) above, once I've parsed a and notice that it's immediately followed by ,, I could backtrack to the start location of a, "insert" a (, and begin parsing it as a tuple.

That's about as far as I've gotten so far. I have some questions:

Am I on the right track?
Is it possible to backtrack, then re-parse what I thought was an identifier as a tuple instead? Or is that not the correct approach?
Does it always make sense to parse <identifier>, <identifier> as (<identifier>, <identifier>)? Or should I be concentrating on the problem as stated, a, b = b, a?

modocache · 2016-12-19T20:30:40Z

Friendly ping! /cc @jtbandes

As someone new to the parser, I'm not entirely sure if this is achievable. Specifically, it seems hard to determine where to place (). For example, take the following:

a, b, c = c, a, b

To a human being, it seems obvious that this should be (a, b, c) = (c, a, b). But from the parser's perspective, (a, b, c = c, a, b) is also valid. Looking through the parsing code, I didn't find any straightforward way to, after seeing a,, start parsing this as (a,, and then intuit that the parentheses should be placed at (a, b, c), instead of (a, b),, or (a, b c = c).

Then again, this is my first venture into the parsing code at all, so I absolutely expect to be missing something. Any advice, @jtbandes?

modocache · 2016-12-20T02:20:59Z

Also +cc other parser aficionados, like @rintaro and CodaFi (JIRA User). Any tips, or even just a "this is possible to implement" stamp of approval?

rintaro · 2016-12-20T06:48:30Z

First of all, this is not a StarterBug 🙂
But, I do think this is achievable.

(a, b, c = c, a, b) is also valid.

Yeah, that is the most difficult problem in this issue.
In other languages that allow a, b = b, a constructs, , has higher precedence than =.
In Swift, from the parser's perspective, every operators including as or is has the same precedence, except for ,.
, is so special and has lowest precedence in the parser.

Because of this, the diagnostics would definitely be highly speculative,
but better than consecutive statements ... anyway 🙂

Also, we must take care of other constructs such as:

if <expr>, <expr> { ... }
let cb: (Int, Int) -> Void = { [x = <expr>, y = <expr>] a, b -> Void in ... }

I think the simplest solution would be modifying parseExprSequence.
In it, you can pretend that tok::comma has higher precedence than tok::equal
only if we are parsing a expression as a statement (you can use StructureMarkers).

Add case tok::comma in the main switch in parseExprSequence
Add some logic to case tok::equal.
Somehow, <expr1>, <expr2> = <expr3> + <expr4> - <expr5>, <expr6> ?? <expr7> should results:

(sequence_expr
  (tuple_expr implicit
    (<expr1>)  // ,
    (<expr2>))
  (assign_expr) // =
  (tuple_expr implicit
    (sequence_expr
      (<expr3>)
      (unresolved_decr_ref_expr name=+) // +
      (<expr4>)
      (unresolved_decr_ref_expr name=-) // -
      (<expr5>)) // ,
    (sequence_expr
      (<expr6>)
      (unresolved_decr_ref_expr name=??) // ??
      (<expr7>))))

You can emit error diagnostics with fix-it when constructing implicit TupleExpr.

I think, you can ignore "mutability" of LHS(s), or difference of num of elements between LHS and RHS.
Sema can diagnose them.

Disclaimer: I'm not 100% sure about this strategy 😛

jtbandes · 2016-12-20T06:58:00Z

You could also consider parseStmt as the entry point for diagnosing this issue. Looking around I also notice Token::isAtStartOfLine() which may or may not be helpful in deciding whether to consider the possibility of the missing-parens diagnosis. (BTW, I believe what it should produce is a TupleShuffleExpr.)

jtbandes · 2016-12-20T07:01:37Z

Ah, Rintaro is right, I guess the TupleShuffle is only produced during type checking. It's just a SequenceExpr during parsing.

modocache · 2016-12-20T07:13:27Z

Awesome, thanks so much for the explanations. This definitely seems like more than I can accomplish in the next few weeks, so I'll put it back up for grabs. I'll try to build up my skills with other, smaller parser tasks first, then revisit this later.

I also removed the "StarterBug" label – although it was a very good prompt to explore a lot of the parser code, so: thank you! (And this definitely seems like a really great diagnostic to have, I'm excited to see someone take it on.)

jtbandes · 2016-12-20T07:14:33Z

Re: backtracking, you can also find some places where BacktrackingScope is used. It looks like that object also takes care of hiding diagnostics emitted while it's active (via DiagnosticTransaction). I can't make any claims as to what is the "best" approach for trying multiple parses like this, I would just recommend looking around to find other cases which do the same thing. There seems to be some precedent for separating diagnosis like this into separate functions, e.g. bool diagnoseFoo(...); I'm sure you can pick the appropriate level of abstraction as you work on it.

I think you're on the right track in general 🙂 I definitely had to struggle through skimming lots of parser/typechecker code when I began working on these sorts of bugs (and I still do, every time — it's such a large component). From your analysis it sounds like you've figured out the general workflow of tracing through where various expressions are first created / the decisions being made by the parser along the way.

Test cases will be interesting too. (a, b, c = c, a, b) being valid is kinda funny — I imagine with the right variable types, (a, b = c, d) = (a = d, c, b) would be perfectly valid code. To keep this simple you'd probably want to restrict it to when there's just one = in the whole sequence (or generally, when there's no ambiguity in where to split it).

jtbandes · 2016-12-20T07:37:33Z

You may have an easier time fixing bugs to start with, rather than adding "features" like new diagnostics 😉

rintaro · 2016-12-20T07:37:37Z

Even a = (b, c) = d can be valid.

var a: ()
var b: Int
var c: Int
let d: (Int, Int) = (12, 34)

a = (b, c) = d

Hmm, a = b, c = d doesn't look like a = (b, c) = d 🙁

typesanitizer · 2020-09-24T17:47:31Z

@swift-ci create

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

jtbandes commented Aug 20, 2016

modocache mannequin commented Dec 10, 2016

modocache mannequin commented Dec 14, 2016

modocache mannequin commented Dec 19, 2016

modocache mannequin commented Dec 20, 2016

rintaro mannequin commented Dec 20, 2016

jtbandes commented Dec 20, 2016

jtbandes commented Dec 20, 2016

modocache mannequin commented Dec 20, 2016

jtbandes commented Dec 20, 2016

jtbandes commented Dec 20, 2016

rintaro mannequin commented Dec 20, 2016

typesanitizer commented Sep 24, 2020

[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

[SR-2442] Need better diagnostics for multi-value assignment with missing parens #45047

Comments

jtbandes commented Aug 20, 2016

modocache mannequin commented Dec 10, 2016

modocache mannequin commented Dec 14, 2016

modocache mannequin commented Dec 19, 2016

modocache mannequin commented Dec 20, 2016

rintaro mannequin commented Dec 20, 2016

jtbandes commented Dec 20, 2016

jtbandes commented Dec 20, 2016

modocache mannequin commented Dec 20, 2016

jtbandes commented Dec 20, 2016

jtbandes commented Dec 20, 2016

rintaro mannequin commented Dec 20, 2016

typesanitizer commented Sep 24, 2020