Skip to content

Conversation

wcjohnson
Copy link

Fixes lightscript/lightscript#32

  • Allow parsing of single statements in oneline whiteblocks
  • Change if and arrow parsing to call new subroutine
  • Allow ASI after existing semicolons and in other edge cases
  • Add class to startsExpr so ClassExpressions don’t get parsed as stmts
  • Added fixtures

@wcjohnson
Copy link
Author

wcjohnson commented May 13, 2017

Hm, what should be done about this:

f() ->
  a = if x: y else: return z
function f() {
  const a = x ? y : function () {
    return z;
  }();
  return a;
}

This is not technically a parsing problem but the output is semantically incorrect. There is a similar issue in coffeescript which they resolve by raising a syntax error when you put a return inside an expression context.

Should we disallow returns in here at parse time, or defer to compile time?

@rattrayalex
Copy link

Great question. If we can do at parse time that's better, compile time is ok too.

My guess is that we could do this by setting this.state.inFunction (or w/e) to false within such contexts... but there are a number of multiline-expression contexts in LightScript, and unifying them might make sense.

Copy link

@rattrayalex rattrayalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't had the chance to look through tests yet...

this.match(tt.parenR) ||
// Technically it is legal to insert a ; after a ;.
// Allows -> throw new Error; f()
this.state.tokens[this.state.tokens.length - 1].type === tt.semi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... this hints at something being wrong elsewhere...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ; after ; thing is the result of going down a pretty deep rabbit hole. In JS, (x) => y; z is legal (compiles to a Body of two ExpressionStatements) and I wanted to maintain consistency.

So that would mean (x) => throw new Error; z should compile to a Body of two ExpressionStatements as well. (In Coffeescript, that actually compiles to one function with two statements in the body)

Ultimately the thing that was preventing that was the absence of this line of code here, so I added it, and it had the result I wanted without breaking JS, so I went with it. Not sure about better options.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be able to go away once you rebase on master, does that sound right? (guessing you just haven't gotten to that yet)

Copy link
Author

@wcjohnson wcjohnson May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how that would happen. How is (-> throw new Error)() going to compile? It calls pp.semicolon() after parsing the throw, but there isn't a semicolon, EOL, or brace there...

Unless I misunderstand the ASI portion of the parser in some fundamental way, I believe this change is necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, gotcha. Wasn't clear to me earlier that that was the problem.

Do you know off the top of your head if it's just throw or if this problem exists across many desirably legal statements?

One possibility would be to modify parseThrowStatement to be "softer" than just this.semicolon().

Another would be to parse throw as an expression generally, which I think could make sense as a looser lightscript semantic (would just need to ensure it's sometimes wrapped in an iife in compiler).

The last thing on my mind here is that the cases you've added here do indeed make sense in this context and quite probably others – and I'm thinking about whether they should be enabled conditionally and at what layer of abstraction.

Just want to be thoughtful about this. I'm hopeful to be able to play around with it myself later today, and there's a good chance I'll end up in solid agreement that the current implementation is the best path forward.

if (isExpression) {
this.expect(tt.colon);
if (!this.isLineBreak()) return this.parseMaybeAssign();
if (!this.isLineBreak()) return this.parseWhiteBlockBody(node, indentLevel, true, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bunch of the above stuff could be replaced with:

if (!this.isLineBreak()) {
  const stmt = this.parseStatement(false);
  return stmt.type === "ExpressionStatement" ? stmt.expression : stmt;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right? Would this obviate the need for couldBeginStatement and parseWhiteBlockBody?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try that, I don't like couldBeginStatement. Not sure if there are any additional edge cases there but I can't think of any off hand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility is that it will allow things like break and continue where they don't make any sense, but maybe that is not a parser issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's best handled elsewhere since they can happen within a block anyway:

x = if true:
  1 + 1
  break
  return
else:
  foo()
  continue

etc

though actually I think break and continue might be fine in there...

Copy link
Author

@wcjohnson wcjohnson May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the compiler is going to turn that into a ternary, and the break will get wrapped into an iife by replaceExpressionWithStatements -- and a break inside an iife by itself I think might be illegal JS. Will test and see what happens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while true:
  x = if true:
    break
while (true) {
  const x = true ? function () {
    break;
  }() : null;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chrome says: Uncaught SyntaxError: Illegal break statement to that, so it is illegal js

@wcjohnson
Copy link
Author

wcjohnson commented May 14, 2017

Trying to use parseStatement as suggested breaks a number of JS fixtures. Most are in the following vein:

  index › es2017/async functions/10
  Error: /Users/wcj/dev/lightscript-devkit/packages/babylon-lightscript/test/fixtures/es2017/async-functions/10/actual.js: 26 !== 27 (end/expression/1/body/program)
    runTest (test/utils/runFixtureTests.js:169:13)
    Test.fn (test/utils/runFixtureTests.js:107:20)

The issue is that parseExpressionStatement is scooping up the ; into the expression node somehow (which is probably incorrect)

There are a select few stranger ones that seem like legitimate breakages, though:

f(a, async(x, y) => await [x, y], b)
  index › es2017/async functions/26
  Error: /Users/wcj/dev/lightscript-devkit/packages/babylon-lightscript/test/fixtures/es2017/async-functions/26/actual.js: array length mismatch 3 != 2 (arguments/expression/0/body/program)
    runTest (test/utils/runFixtureTests.js:169:13)
    Test.fn (test/utils/runFixtureTests.js:107:20)

The issue here is that lambdas in JS are parsed with parseMaybeAssign -- switching to parseStatement here is calling parseExpression rather than parseMaybeAssign -- parseExpression can parse sequence expressions -- so finally it interprets that inner function as a lambda returning a SequenceExpression as opposed to the outer function call with 3 arguments, which it is in JS.

This seems like quite a can of worms, and my original approach doesn't break any of these things. I'm inclined to stick with it unless you can see another way through.

@wcjohnson wcjohnson changed the title Oneline whiteblock parsing fixes [WIP] Oneline whiteblock parsing fixes May 14, 2017
@wcjohnson wcjohnson changed the title [WIP] Oneline whiteblock parsing fixes Oneline whiteblock parsing fixes May 14, 2017
@wcjohnson
Copy link
Author

OK, I think this pr is in much better shape now and the case of illegal return etc. inside expressions should be handled better. Should be good to review more closely.

switch (starttype) {
case tt._break: case tt._continue: return this.parseBreakContinueStatement(node, starttype.keyword);
case tt._break:
if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "break is illegal inside LightScript expressions");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsyntactic break is the error babel throws when you try to do what this compiles to, so if anything let's keep that

// does not help.

pp.parseStatement = function (declaration, topLevel) {
pp.parseStatement = function (declaration, topLevel, isExprContext) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like adding parameters to preexisting babylon functions...

pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) {
if (this.state.type.startsExpr) return this.parseMaybeAssign();
// oneline statement case
node.body = this.couldBeginStatement() ? [this.parseStatement(true, false, isExprContext)] : [];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here? why not just this.parseStatement?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried the approach I outlined in gitter? Just parseStatement and extract the expression (if relevant) and/or raise on illegal things thereafter?

I haven't fiddled, so I'm not sure what problems it might cause, but my hunch is it would reduce the LOC diff of this PR by a factor of 3-5...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did try that approach as you suggested. I left an extensive comment about it which I assume got eaten or something?

It causes several dozen JS fixtures to break. The most interesting breakages involve ambiguities between SequenceExpressions and function argument lists. The breakages are caused by parseStatement going down a deeper parsing pathway than parseMaybeAssign, along which comma-separated expressions get turned into SequenceExpressions instead of unwinding the recursive descent.

"this": new KeywordTokenType("this", { startsExpr }),
"super": new KeywordTokenType("super", { startsExpr }),
"class": new KeywordTokenType("class"),
"class": new KeywordTokenType("class", { startsExpr }),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a separate PR (seems like something that should go upstream? would first want to understand why it wasn't here before...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, this is fine in here. I think it makes sense


node.body = this.parseWhiteBlock(true);
const indentLevel = this.state.indentLevel;
node.body = this.startNode();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer to keep a single call here as before...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind about the single call, but node.body = this.startNode(); here can definitely be removed

this.match(tt.parenR) ||
// Technically it is legal to insert a ; after a ;.
// Allows -> throw new Error; f()
this.state.tokens[this.state.tokens.length - 1].type === tt.semi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.

case tt._return: return this.parseReturnStatement(node);

case tt._return:
if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "return is illegal inside LightScript expressions");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's instead use a this.state.inExpressionBlock or something, and inside this.parseReturnStatement / this.parseBreakContinueStatement do the check.

Then we won't need to pass around isExprContext like this or modify this.parseStatement itself at all.

return this.finishNode(node, "BlockStatement");
};

pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed feelings about breaking these out in general, but I see the appeal... should probably be called Oneline not Inline though...

@wcjohnson
Copy link
Author

Sounds like you may have some better ideas than I do about how to do this, so I think I'll kick it back your way. Branch will be open if you want any of the code.

@rattrayalex
Copy link

Hey, sorry I didn't see that earlier comment. Thanks for taking the time to explain

@wcjohnson
Copy link
Author

@rattrayalex

I applied the fixes for empty whiteblocks to this branch. Because of the divergences, the changes were basically not directly mergeable.

I think you should consider reverting the merge from last night and merging this branch instead (or just resolve conflicts with my side)

@rattrayalex
Copy link

Cool, I can try to resolve the conflicts


// c/p parseBlock
pp.finishWhiteBlock = function(node) {
if (!node.body.length) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check can now be moved to parseMultilineWhiteBlock

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in which case I'm not sure this method is necessary (up to you)

pp.parseInlineWhiteBlock = function(node) {
if (this.state.type.startsExpr) return this.parseMaybeAssign();
// oneline statement case
node.body = [this.parseStatement(true)];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be in a list

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? The node gets turned into a BlockStatement by finishWhiteBlock, and babel-types says those have to have bodies that are arrays:

defineType("BlockStatement", {
  builder: ["body", "directives"],
  visitor: ["directives", "body"],
  fields: {
    directives: {
      validate: chain(assertValueType("array"), assertEach(assertNodeType("Directive"))),
      default: [],
    },
    body: {
      validate: chain(assertValueType("array"), assertEach(assertNodeType("Statement"))),
    },
  },
  aliases: ["Scopable", "BlockParent", "Block", "Statement"],
});

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I guess you're saying just return the parsed statement and don't wrap it at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this change and there is a problem. parseArrowFunctionBody is doing a fixup here: https://github.com/wcjohnson/babylon-lightscript/blob/upstream/oneline-whiteblocks/src/plugins/lightscript.js#L328

Because the parser is no longer producing BlockStatements after this change, functions are getting false-flagged as expression=true when they aren't expression bodies.


node.body = this.parseWhiteBlock(true);
const indentLevel = this.state.indentLevel;
node.body = this.startNode();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind about the single call, but node.body = this.startNode(); here can definitely be removed

"this": new KeywordTokenType("this", { startsExpr }),
"super": new KeywordTokenType("super", { startsExpr }),
"class": new KeywordTokenType("class"),
"class": new KeywordTokenType("class", { startsExpr }),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, this is fine in here. I think it makes sense

@rattrayalex
Copy link

@wcjohnson would you be willing to squash commits?

@wcjohnson
Copy link
Author

@rattrayalex Yes, when I get a moment I will clean up the comments you left and then I can squash after a final go ahead

@wcjohnson
Copy link
Author

@rattrayalex

Upon testing, removing that node.body = this.startNode() broke things and was incorrect. We need to start a node there because the whiteblock body parser uses that node as the place to insert the body, and that location includes the arrow token.

I clarified the code a little bit but semantically it's the same.

@wcjohnson
Copy link
Author

@rattrayalex Comments have been addressed, let me know if anything else needs doing before the squash

@wcjohnson
Copy link
Author

wcjohnson commented May 15, 2017

@rattrayalex
I think one of my comments got eaten. Re your comments on:

  // oneline statement case
  node.body = [this.parseStatement(true)];
  node.directives = [];
  this.addExtra(node, "curly", false);
  return this.finishNode(node, "BlockStatement");

You suggested dropping the array there. I'm not sure that's correct. According to babel-types, a BlockStatement requires an array as its body. We could cut out the BlockStatement and just return parseStatement but this breaks two things:

  1. There is a fixup later down the line in parseArrowBody that turns anything not a BlockStatement as an expression. Returning a vanilla statement here breaks that check and breaks a lot of function body fixtures by falsely flagging the bodies as expressions.

  2. Calling parseStatement there causes the arrow token not to be included in the arrow body's source location. I am not sure whether that's right or not, but heretofore, we have been parsing the arrow token as being part of the arrow body when it's a whiteblock.

@rattrayalex
Copy link

Cool, thanks for explaining! That definitely makes sense. I'll give another look but anticipate being 👍

Copy link

@rattrayalex rattrayalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, ready for squash! Thanks!

I do hope to cleanup the canInsertSemicolon bit in the future...


node.body = this.parseWhiteBlock(true);
const indentLevel = this.state.indentLevel;
const nodeAtArrow = this.startNode();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, cool, thanks – that is clearer

@@ -0,0 +1,3 @@
x = {
method() -> for idx i in Array(10): i

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jw was there a special case that methods hit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; this is probably a redundant fixture.

@wcjohnson wcjohnson force-pushed the upstream/oneline-whiteblocks branch from ca19543 to ba10bc8 Compare May 16, 2017 00:57
@wcjohnson
Copy link
Author

Squashing this is actually a little tricky because it straddles the merge and unmerge of babylon 7. Let me see what I can do.

- Allow parsing of single statements in oneline whiteblocks
- Change if and arrow parsing to call new subroutine
- Allow ASI after existing semicolons and in other edge cases
- Add class to startsExpr so `ClassExpression`s don’t get parsed as stmts
- Added fixtures

Block parsing cleanup

- Move special parsing case for arrows into parseArrowBody
- Break up parseWhiteBlockBody

Whiteblocks don’t allow directives

Revert changes adding `exprContext` to parsing functions

Disallow empty whiteblocks (removes need for `couldBeginStatement`)

- Disallow empty whiteblocks
- Update fixtures
- Remove some obsolete fixtures
- Use idiomatic `-> return` in fixtures
@wcjohnson wcjohnson force-pushed the upstream/oneline-whiteblocks branch from ba10bc8 to c9526a8 Compare May 16, 2017 01:39
@wcjohnson
Copy link
Author

@rattrayalex OK, that was an ugly rebase but I think I got it right, should be squashed now.

@rattrayalex
Copy link

Woohoo! This was a heavy lift. Great work @wcjohnson and thanks for putting up with all my (sometimes ignorant) asks.

@rattrayalex rattrayalex merged commit c09bdf9 into lightscript:lightscript May 16, 2017
@wcjohnson wcjohnson deleted the upstream/oneline-whiteblocks branch May 31, 2017 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants