Oneline whiteblock parsing fixes #27

wcjohnson · 2017-05-13T17:44:01Z

Allow parsing of single statements in oneline whiteblocks
Change if and arrow parsing to call new subroutine
Allow ASI after existing semicolons and in other edge cases
Add class to startsExpr so ClassExpressions don’t get parsed as stmts
Added fixtures

wcjohnson · 2017-05-13T18:21:00Z

Hm, what should be done about this:

f() ->
  a = if x: y else: return z

function f() {
  const a = x ? y : function () {
    return z;
  }();
  return a;
}

This is not technically a parsing problem but the output is semantically incorrect. There is a similar issue in coffeescript which they resolve by raising a syntax error when you put a return inside an expression context.

Should we disallow returns in here at parse time, or defer to compile time?

rattrayalex · 2017-05-13T19:21:30Z

Great question. If we can do at parse time that's better, compile time is ok too.

My guess is that we could do this by setting this.state.inFunction (or w/e) to false within such contexts... but there are a number of multiline-expression contexts in LightScript, and unifying them might make sense.

rattrayalex

haven't had the chance to look through tests yet...

rattrayalex · 2017-05-13T19:22:50Z

src/parser/util.js

+      this.match(tt.parenR) ||
+      // Technically it is legal to insert a ; after a ;.
+      // Allows -> throw new Error; f()
+      this.state.tokens[this.state.tokens.length - 1].type === tt.semi


hmm... this hints at something being wrong elsewhere...

The ; after ; thing is the result of going down a pretty deep rabbit hole. In JS, (x) => y; z is legal (compiles to a Body of two ExpressionStatements) and I wanted to maintain consistency.

So that would mean (x) => throw new Error; z should compile to a Body of two ExpressionStatements as well. (In Coffeescript, that actually compiles to one function with two statements in the body)

Ultimately the thing that was preventing that was the absence of this line of code here, so I added it, and it had the result I wanted without breaking JS, so I went with it. Not sure about better options.

This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.

I think this should be able to go away once you rebase on master, does that sound right? (guessing you just haven't gotten to that yet)

It's not clear to me how that would happen. How is (-> throw new Error)() going to compile? It calls pp.semicolon() after parsing the throw, but there isn't a semicolon, EOL, or brace there...

Unless I misunderstand the ASI portion of the parser in some fundamental way, I believe this change is necessary.

ah, gotcha. Wasn't clear to me earlier that that was the problem.

Do you know off the top of your head if it's just throw or if this problem exists across many desirably legal statements?

One possibility would be to modify parseThrowStatement to be "softer" than just this.semicolon().

Another would be to parse throw as an expression generally, which I think could make sense as a looser lightscript semantic (would just need to ensure it's sometimes wrapped in an iife in compiler).

The last thing on my mind here is that the cases you've added here do indeed make sense in this context and quite probably others – and I'm thinking about whether they should be enabled conditionally and at what layer of abstraction.

Just want to be thoughtful about this. I'm hopeful to be able to play around with it myself later today, and there's a good chance I'll end up in solid agreement that the current implementation is the best path forward.

rattrayalex · 2017-05-13T19:31:10Z

src/plugins/lightscript.js

  if (isExpression) {
    this.expect(tt.colon);
-    if (!this.isLineBreak()) return this.parseMaybeAssign();
+    if (!this.isLineBreak()) return this.parseWhiteBlockBody(node, indentLevel, true, false);


I think a bunch of the above stuff could be replaced with:

if (!this.isLineBreak()) { const stmt = this.parseStatement(false); return stmt.type === "ExpressionStatement" ? stmt.expression : stmt; }

Right? Would this obviate the need for couldBeginStatement and parseWhiteBlockBody?

I will try that, I don't like couldBeginStatement. Not sure if there are any additional edge cases there but I can't think of any off hand.

One possibility is that it will allow things like break and continue where they don't make any sense, but maybe that is not a parser issue?

Yeah, I think that's best handled elsewhere since they can happen within a block anyway:

x = if true: 1 + 1 break return else: foo() continue

etc

though actually I think break and continue might be fine in there...

If I understand correctly, the compiler is going to turn that into a ternary, and the break will get wrapped into an iife by replaceExpressionWithStatements -- and a break inside an iife by itself I think might be illegal JS. Will test and see what happens.

while true: x = if true: break

while (true) { const x = true ? function () { break; }() : null; }

Chrome says: Uncaught SyntaxError: Illegal break statement to that, so it is illegal js

wcjohnson · 2017-05-14T00:34:48Z

Trying to use parseStatement as suggested breaks a number of JS fixtures. Most are in the following vein:

  index › es2017/async functions/10
  Error: /Users/wcj/dev/lightscript-devkit/packages/babylon-lightscript/test/fixtures/es2017/async-functions/10/actual.js: 26 !== 27 (end/expression/1/body/program)
    runTest (test/utils/runFixtureTests.js:169:13)
    Test.fn (test/utils/runFixtureTests.js:107:20)

The issue is that parseExpressionStatement is scooping up the ; into the expression node somehow (which is probably incorrect)

There are a select few stranger ones that seem like legitimate breakages, though:

f(a, async(x, y) => await [x, y], b)

  index › es2017/async functions/26
  Error: /Users/wcj/dev/lightscript-devkit/packages/babylon-lightscript/test/fixtures/es2017/async-functions/26/actual.js: array length mismatch 3 != 2 (arguments/expression/0/body/program)
    runTest (test/utils/runFixtureTests.js:169:13)
    Test.fn (test/utils/runFixtureTests.js:107:20)

The issue here is that lambdas in JS are parsed with parseMaybeAssign -- switching to parseStatement here is calling parseExpression rather than parseMaybeAssign -- parseExpression can parse sequence expressions -- so finally it interprets that inner function as a lambda returning a SequenceExpression as opposed to the outer function call with 3 arguments, which it is in JS.

This seems like quite a can of worms, and my original approach doesn't break any of these things. I'm inclined to stick with it unless you can see another way through.

wcjohnson · 2017-05-14T03:14:54Z

OK, I think this pr is in much better shape now and the case of illegal return etc. inside expressions should be handled better. Should be good to review more closely.

rattrayalex · 2017-05-14T15:42:54Z

src/parser/statement.js

  switch (starttype) {
-    case tt._break: case tt._continue: return this.parseBreakContinueStatement(node, starttype.keyword);
+    case tt._break:
+      if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "break is illegal inside LightScript expressions");


Unsyntactic break is the error babel throws when you try to do what this compiles to, so if anything let's keep that

rattrayalex · 2017-05-14T15:43:33Z

src/parser/statement.js

 // does not help.

-pp.parseStatement = function (declaration, topLevel) {
+pp.parseStatement = function (declaration, topLevel, isExprContext) {


I really don't like adding parameters to preexisting babylon functions...

rattrayalex · 2017-05-14T15:46:48Z

src/plugins/lightscript.js

+pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) {
+  if (this.state.type.startsExpr) return this.parseMaybeAssign();
+  // oneline statement case
+  node.body = this.couldBeginStatement() ? [this.parseStatement(true, false, isExprContext)] : [];


why is this here? why not just this.parseStatement?

Have you tried the approach I outlined in gitter? Just parseStatement and extract the expression (if relevant) and/or raise on illegal things thereafter?

I haven't fiddled, so I'm not sure what problems it might cause, but my hunch is it would reduce the LOC diff of this PR by a factor of 3-5...

Yes, I did try that approach as you suggested. I left an extensive comment about it which I assume got eaten or something?

It causes several dozen JS fixtures to break. The most interesting breakages involve ambiguities between SequenceExpressions and function argument lists. The breakages are caused by parseStatement going down a deeper parsing pathway than parseMaybeAssign, along which comma-separated expressions get turned into SequenceExpressions instead of unwinding the recursive descent.

rattrayalex · 2017-05-14T16:05:07Z

src/tokenizer/types.js

  "this": new KeywordTokenType("this", { startsExpr }),
  "super": new KeywordTokenType("super", { startsExpr }),
-  "class": new KeywordTokenType("class"),
+  "class": new KeywordTokenType("class", { startsExpr }),


This should be in a separate PR (seems like something that should go upstream? would first want to understand why it wasn't here before...)

eh, this is fine in here. I think it makes sense

rattrayalex · 2017-05-14T16:50:35Z

src/plugins/lightscript.js


-  node.body = this.parseWhiteBlock(true);
+  const indentLevel = this.state.indentLevel;
+  node.body = this.startNode();


unnecessary

I'd much prefer to keep a single call here as before...

nevermind about the single call, but node.body = this.startNode(); here can definitely be removed

rattrayalex · 2017-05-14T16:53:49Z

src/parser/util.js

+      this.match(tt.parenR) ||
+      // Technically it is legal to insert a ; after a ;.
+      // Allows -> throw new Error; f()
+      this.state.tokens[this.state.tokens.length - 1].type === tt.semi


This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.

rattrayalex · 2017-05-14T16:55:47Z

src/parser/statement.js

-    case tt._return: return this.parseReturnStatement(node);
+
+    case tt._return:
+      if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "return is illegal inside LightScript expressions");


Let's instead use a this.state.inExpressionBlock or something, and inside this.parseReturnStatement / this.parseBreakContinueStatement do the check.

Then we won't need to pass around isExprContext like this or modify this.parseStatement itself at all.

rattrayalex · 2017-05-14T16:59:14Z

src/plugins/lightscript.js

  return this.finishNode(node, "BlockStatement");
 };

+pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) {


mixed feelings about breaking these out in general, but I see the appeal... should probably be called Oneline not Inline though...

wcjohnson · 2017-05-15T00:48:08Z

Sounds like you may have some better ideas than I do about how to do this, so I think I'll kick it back your way. Branch will be open if you want any of the code.

rattrayalex · 2017-05-15T00:49:08Z

Hey, sorry I didn't see that earlier comment. Thanks for taking the time to explain

wcjohnson · 2017-05-15T17:01:20Z

@rattrayalex

I applied the fixes for empty whiteblocks to this branch. Because of the divergences, the changes were basically not directly mergeable.

I think you should consider reverting the merge from last night and merging this branch instead (or just resolve conflicts with my side)

rattrayalex · 2017-05-15T17:41:00Z

Cool, I can try to resolve the conflicts

rattrayalex · 2017-05-15T17:50:40Z

src/plugins/lightscript.js


-// c/p parseBlock
+pp.finishWhiteBlock = function(node) {
+  if (!node.body.length) {


I think this check can now be moved to parseMultilineWhiteBlock

... in which case I'm not sure this method is necessary (up to you)

rattrayalex · 2017-05-15T17:50:47Z

src/plugins/lightscript.js

+pp.parseInlineWhiteBlock = function(node) {
+  if (this.state.type.startsExpr) return this.parseMaybeAssign();
+  // oneline statement case
+  node.body = [this.parseStatement(true)];


shouldn't be in a list

Are you sure about this? The node gets turned into a BlockStatement by finishWhiteBlock, and babel-types says those have to have bodies that are arrays:

defineType("BlockStatement", { builder: ["body", "directives"], visitor: ["directives", "body"], fields: { directives: { validate: chain(assertValueType("array"), assertEach(assertNodeType("Directive"))), default: [], }, body: { validate: chain(assertValueType("array"), assertEach(assertNodeType("Statement"))), }, }, aliases: ["Scopable", "BlockParent", "Block", "Statement"], });

or I guess you're saying just return the parsed statement and don't wrap it at all?

I tried this change and there is a problem. parseArrowFunctionBody is doing a fixup here: https://github.com/wcjohnson/babylon-lightscript/blob/upstream/oneline-whiteblocks/src/plugins/lightscript.js#L328

Because the parser is no longer producing BlockStatements after this change, functions are getting false-flagged as expression=true when they aren't expression bodies.

rattrayalex · 2017-05-15T17:52:38Z

src/plugins/lightscript.js


-  node.body = this.parseWhiteBlock(true);
+  const indentLevel = this.state.indentLevel;
+  node.body = this.startNode();


nevermind about the single call, but node.body = this.startNode(); here can definitely be removed

rattrayalex · 2017-05-15T18:20:19Z

src/tokenizer/types.js

  "this": new KeywordTokenType("this", { startsExpr }),
  "super": new KeywordTokenType("super", { startsExpr }),
-  "class": new KeywordTokenType("class"),
+  "class": new KeywordTokenType("class", { startsExpr }),


eh, this is fine in here. I think it makes sense

rattrayalex · 2017-05-15T19:13:06Z

@wcjohnson would you be willing to squash commits?

wcjohnson · 2017-05-15T19:14:08Z

@rattrayalex Yes, when I get a moment I will clean up the comments you left and then I can squash after a final go ahead

wcjohnson · 2017-05-15T21:11:46Z

@rattrayalex

Upon testing, removing that node.body = this.startNode() broke things and was incorrect. We need to start a node there because the whiteblock body parser uses that node as the place to insert the body, and that location includes the arrow token.

I clarified the code a little bit but semantically it's the same.

wcjohnson · 2017-05-15T21:23:11Z

@rattrayalex Comments have been addressed, let me know if anything else needs doing before the squash

wcjohnson · 2017-05-15T21:27:49Z

@rattrayalex
I think one of my comments got eaten. Re your comments on:

  // oneline statement case
  node.body = [this.parseStatement(true)];
  node.directives = [];
  this.addExtra(node, "curly", false);
  return this.finishNode(node, "BlockStatement");

You suggested dropping the array there. I'm not sure that's correct. According to babel-types, a BlockStatement requires an array as its body. We could cut out the BlockStatement and just return parseStatement but this breaks two things:

There is a fixup later down the line in parseArrowBody that turns anything not a BlockStatement as an expression. Returning a vanilla statement here breaks that check and breaks a lot of function body fixtures by falsely flagging the bodies as expressions.
Calling parseStatement there causes the arrow token not to be included in the arrow body's source location. I am not sure whether that's right or not, but heretofore, we have been parsing the arrow token as being part of the arrow body when it's a whiteblock.

rattrayalex · 2017-05-15T23:32:29Z

Cool, thanks for explaining! That definitely makes sense. I'll give another look but anticipate being 👍

rattrayalex

This looks good, ready for squash! Thanks!

I do hope to cleanup the canInsertSemicolon bit in the future...

rattrayalex · 2017-05-15T23:34:23Z

src/plugins/lightscript.js


-  node.body = this.parseWhiteBlock(true);
+  const indentLevel = this.state.indentLevel;
+  const nodeAtArrow = this.startNode();


ah, cool, thanks – that is clearer

rattrayalex · 2017-05-15T23:36:55Z

test/fixtures/lightscript/arrow-methods/obj-skinny-oneline-stmt/actual.js

@@ -0,0 +1,3 @@
+x = {
+  method() -> for idx i in Array(10): i


jw was there a special case that methods hit?

I don't think so; this is probably a redundant fixture.

wcjohnson · 2017-05-16T01:00:19Z

Squashing this is actually a little tricky because it straddles the merge and unmerge of babylon 7. Let me see what I can do.

- Allow parsing of single statements in oneline whiteblocks - Change if and arrow parsing to call new subroutine - Allow ASI after existing semicolons and in other edge cases - Add class to startsExpr so `ClassExpression`s don’t get parsed as stmts - Added fixtures Block parsing cleanup - Move special parsing case for arrows into parseArrowBody - Break up parseWhiteBlockBody Whiteblocks don’t allow directives Revert changes adding `exprContext` to parsing functions Disallow empty whiteblocks (removes need for `couldBeginStatement`) - Disallow empty whiteblocks - Update fixtures - Remove some obsolete fixtures - Use idiomatic `-> return` in fixtures

wcjohnson · 2017-05-16T01:40:29Z

@rattrayalex OK, that was an ugly rebase but I think I got it right, should be squashed now.

rattrayalex · 2017-05-16T22:52:52Z

Woohoo! This was a heavy lift. Great work @wcjohnson and thanks for putting up with all my (sometimes ignorant) asks.

rattrayalex reviewed May 13, 2017

View reviewed changes

wcjohnson changed the title ~~Oneline whiteblock parsing fixes~~ [WIP] Oneline whiteblock parsing fixes May 14, 2017

wcjohnson changed the title ~~[WIP] Oneline whiteblock parsing fixes~~ Oneline whiteblock parsing fixes May 14, 2017

rattrayalex reviewed May 14, 2017

View reviewed changes

rattrayalex reviewed May 15, 2017

View reviewed changes

rattrayalex approved these changes May 15, 2017

View reviewed changes

wcjohnson force-pushed the upstream/oneline-whiteblocks branch from ca19543 to ba10bc8 Compare May 16, 2017 00:57

wcjohnson force-pushed the upstream/oneline-whiteblocks branch from ba10bc8 to c9526a8 Compare May 16, 2017 01:39

rattrayalex merged commit c09bdf9 into lightscript:lightscript May 16, 2017

wcjohnson deleted the upstream/oneline-whiteblocks branch May 31, 2017 02:35

Oneline whiteblock parsing fixes #27

Oneline whiteblock parsing fixes #27

Uh oh!

Conversation

wcjohnson commented May 13, 2017

Uh oh!

wcjohnson commented May 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattrayalex commented May 13, 2017

Uh oh!

rattrayalex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wcjohnson May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wcjohnson May 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wcjohnson commented May 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wcjohnson commented May 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wcjohnson commented May 15, 2017

Uh oh!

rattrayalex commented May 15, 2017

wcjohnson commented May 13, 2017 •

edited

Loading

wcjohnson May 15, 2017 •

edited

Loading

wcjohnson May 13, 2017 •

edited

Loading

wcjohnson commented May 14, 2017 •

edited

Loading

wcjohnson commented May 15, 2017 •

edited

Loading