-
Notifications
You must be signed in to change notification settings - Fork 5
Oneline whiteblock parsing fixes #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oneline whiteblock parsing fixes #27
Conversation
Hm, what should be done about this: f() ->
a = if x: y else: return z function f() {
const a = x ? y : function () {
return z;
}();
return a;
} This is not technically a parsing problem but the output is semantically incorrect. There is a similar issue in coffeescript which they resolve by raising a syntax error when you put a return inside an expression context. Should we disallow returns in here at parse time, or defer to compile time? |
Great question. If we can do at parse time that's better, compile time is ok too. My guess is that we could do this by setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't had the chance to look through tests yet...
this.match(tt.parenR) || | ||
// Technically it is legal to insert a ; after a ;. | ||
// Allows -> throw new Error; f() | ||
this.state.tokens[this.state.tokens.length - 1].type === tt.semi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... this hints at something being wrong elsewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ;
after ;
thing is the result of going down a pretty deep rabbit hole. In JS, (x) => y; z
is legal (compiles to a Body
of two ExpressionStatement
s) and I wanted to maintain consistency.
So that would mean (x) => throw new Error; z
should compile to a Body
of two ExpressionStatements
as well. (In Coffeescript, that actually compiles to one function with two statements in the body)
Ultimately the thing that was preventing that was the absence of this line of code here, so I added it, and it had the result I wanted without breaking JS, so I went with it. Not sure about better options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be able to go away once you rebase on master, does that sound right? (guessing you just haven't gotten to that yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how that would happen. How is (-> throw new Error)()
going to compile? It calls pp.semicolon()
after parsing the throw, but there isn't a semicolon, EOL, or brace there...
Unless I misunderstand the ASI portion of the parser in some fundamental way, I believe this change is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, gotcha. Wasn't clear to me earlier that that was the problem.
Do you know off the top of your head if it's just throw
or if this problem exists across many desirably legal statements?
One possibility would be to modify parseThrowStatement
to be "softer" than just this.semicolon()
.
Another would be to parse throw
as an expression generally, which I think could make sense as a looser lightscript semantic (would just need to ensure it's sometimes wrapped in an iife in compiler).
The last thing on my mind here is that the cases you've added here do indeed make sense in this context and quite probably others – and I'm thinking about whether they should be enabled conditionally and at what layer of abstraction.
Just want to be thoughtful about this. I'm hopeful to be able to play around with it myself later today, and there's a good chance I'll end up in solid agreement that the current implementation is the best path forward.
src/plugins/lightscript.js
Outdated
if (isExpression) { | ||
this.expect(tt.colon); | ||
if (!this.isLineBreak()) return this.parseMaybeAssign(); | ||
if (!this.isLineBreak()) return this.parseWhiteBlockBody(node, indentLevel, true, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a bunch of the above stuff could be replaced with:
if (!this.isLineBreak()) {
const stmt = this.parseStatement(false);
return stmt.type === "ExpressionStatement" ? stmt.expression : stmt;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right? Would this obviate the need for couldBeginStatement
and parseWhiteBlockBody
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try that, I don't like couldBeginStatement
. Not sure if there are any additional edge cases there but I can't think of any off hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility is that it will allow things like break
and continue
where they don't make any sense, but maybe that is not a parser issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's best handled elsewhere since they can happen within a block anyway:
x = if true:
1 + 1
break
return
else:
foo()
continue
etc
though actually I think break and continue might be fine in there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the compiler is going to turn that into a ternary, and the break will get wrapped into an iife by replaceExpressionWithStatements
-- and a break inside an iife by itself I think might be illegal JS. Will test and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while true:
x = if true:
break
while (true) {
const x = true ? function () {
break;
}() : null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chrome says: Uncaught SyntaxError: Illegal break statement
to that, so it is illegal js
Trying to use
The issue is that There are a select few stranger ones that seem like legitimate breakages, though: f(a, async(x, y) => await [x, y], b)
The issue here is that lambdas in JS are parsed with This seems like quite a can of worms, and my original approach doesn't break any of these things. I'm inclined to stick with it unless you can see another way through. |
OK, I think this pr is in much better shape now and the case of illegal return etc. inside expressions should be handled better. Should be good to review more closely. |
src/parser/statement.js
Outdated
switch (starttype) { | ||
case tt._break: case tt._continue: return this.parseBreakContinueStatement(node, starttype.keyword); | ||
case tt._break: | ||
if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "break is illegal inside LightScript expressions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsyntactic break
is the error babel throws when you try to do what this compiles to, so if anything let's keep that
src/parser/statement.js
Outdated
// does not help. | ||
|
||
pp.parseStatement = function (declaration, topLevel) { | ||
pp.parseStatement = function (declaration, topLevel, isExprContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like adding parameters to preexisting babylon functions...
src/plugins/lightscript.js
Outdated
pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) { | ||
if (this.state.type.startsExpr) return this.parseMaybeAssign(); | ||
// oneline statement case | ||
node.body = this.couldBeginStatement() ? [this.parseStatement(true, false, isExprContext)] : []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this here? why not just this.parseStatement
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried the approach I outlined in gitter? Just parseStatement
and extract the expression (if relevant) and/or raise on illegal things thereafter?
I haven't fiddled, so I'm not sure what problems it might cause, but my hunch is it would reduce the LOC diff of this PR by a factor of 3-5...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did try that approach as you suggested. I left an extensive comment about it which I assume got eaten or something?
It causes several dozen JS fixtures to break. The most interesting breakages involve ambiguities between SequenceExpression
s and function argument lists. The breakages are caused by parseStatement
going down a deeper parsing pathway than parseMaybeAssign
, along which comma-separated expressions get turned into SequenceExpression
s instead of unwinding the recursive descent.
"this": new KeywordTokenType("this", { startsExpr }), | ||
"super": new KeywordTokenType("super", { startsExpr }), | ||
"class": new KeywordTokenType("class"), | ||
"class": new KeywordTokenType("class", { startsExpr }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a separate PR (seems like something that should go upstream? would first want to understand why it wasn't here before...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, this is fine in here. I think it makes sense
src/plugins/lightscript.js
Outdated
|
||
node.body = this.parseWhiteBlock(true); | ||
const indentLevel = this.state.indentLevel; | ||
node.body = this.startNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much prefer to keep a single call here as before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind about the single call, but node.body = this.startNode();
here can definitely be removed
this.match(tt.parenR) || | ||
// Technically it is legal to insert a ; after a ;. | ||
// Allows -> throw new Error; f() | ||
this.state.tokens[this.state.tokens.length - 1].type === tt.semi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really looks wrong to me. None of these things should be here; they belong elsewhere at a call site or something.
src/parser/statement.js
Outdated
case tt._return: return this.parseReturnStatement(node); | ||
|
||
case tt._return: | ||
if (this.hasPlugin("lightscript") && isExprContext) this.unexpected(null, "return is illegal inside LightScript expressions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's instead use a this.state.inExpressionBlock
or something, and inside this.parseReturnStatement
/ this.parseBreakContinueStatement
do the check.
Then we won't need to pass around isExprContext
like this or modify this.parseStatement
itself at all.
src/plugins/lightscript.js
Outdated
return this.finishNode(node, "BlockStatement"); | ||
}; | ||
|
||
pp.parseInlineWhiteBlock = function(node, isExprContext, allowEmptyBody) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixed feelings about breaking these out in general, but I see the appeal... should probably be called Oneline not Inline though...
Sounds like you may have some better ideas than I do about how to do this, so I think I'll kick it back your way. Branch will be open if you want any of the code. |
Hey, sorry I didn't see that earlier comment. Thanks for taking the time to explain |
I applied the fixes for empty whiteblocks to this branch. Because of the divergences, the changes were basically not directly mergeable. I think you should consider reverting the merge from last night and merging this branch instead (or just resolve conflicts with my side) |
Cool, I can try to resolve the conflicts |
src/plugins/lightscript.js
Outdated
|
||
// c/p parseBlock | ||
pp.finishWhiteBlock = function(node) { | ||
if (!node.body.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check can now be moved to parseMultilineWhiteBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... in which case I'm not sure this method is necessary (up to you)
pp.parseInlineWhiteBlock = function(node) { | ||
if (this.state.type.startsExpr) return this.parseMaybeAssign(); | ||
// oneline statement case | ||
node.body = [this.parseStatement(true)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be in a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? The node gets turned into a BlockStatement
by finishWhiteBlock
, and babel-types says those have to have bodies that are arrays:
defineType("BlockStatement", {
builder: ["body", "directives"],
visitor: ["directives", "body"],
fields: {
directives: {
validate: chain(assertValueType("array"), assertEach(assertNodeType("Directive"))),
default: [],
},
body: {
validate: chain(assertValueType("array"), assertEach(assertNodeType("Statement"))),
},
},
aliases: ["Scopable", "BlockParent", "Block", "Statement"],
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or I guess you're saying just return the parsed statement and don't wrap it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this change and there is a problem. parseArrowFunctionBody
is doing a fixup here: https://github.com/wcjohnson/babylon-lightscript/blob/upstream/oneline-whiteblocks/src/plugins/lightscript.js#L328
Because the parser is no longer producing BlockStatement
s after this change, functions are getting false-flagged as expression=true
when they aren't expression bodies.
src/plugins/lightscript.js
Outdated
|
||
node.body = this.parseWhiteBlock(true); | ||
const indentLevel = this.state.indentLevel; | ||
node.body = this.startNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind about the single call, but node.body = this.startNode();
here can definitely be removed
"this": new KeywordTokenType("this", { startsExpr }), | ||
"super": new KeywordTokenType("super", { startsExpr }), | ||
"class": new KeywordTokenType("class"), | ||
"class": new KeywordTokenType("class", { startsExpr }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, this is fine in here. I think it makes sense
@wcjohnson would you be willing to squash commits? |
@rattrayalex Yes, when I get a moment I will clean up the comments you left and then I can squash after a final go ahead |
Upon testing, removing that I clarified the code a little bit but semantically it's the same. |
@rattrayalex Comments have been addressed, let me know if anything else needs doing before the squash |
@rattrayalex // oneline statement case
node.body = [this.parseStatement(true)];
node.directives = [];
this.addExtra(node, "curly", false);
return this.finishNode(node, "BlockStatement"); You suggested dropping the array there. I'm not sure that's correct. According to
|
Cool, thanks for explaining! That definitely makes sense. I'll give another look but anticipate being 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, ready for squash! Thanks!
I do hope to cleanup the canInsertSemicolon
bit in the future...
|
||
node.body = this.parseWhiteBlock(true); | ||
const indentLevel = this.state.indentLevel; | ||
const nodeAtArrow = this.startNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, cool, thanks – that is clearer
@@ -0,0 +1,3 @@ | |||
x = { | |||
method() -> for idx i in Array(10): i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jw was there a special case that methods hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so; this is probably a redundant fixture.
ca19543
to
ba10bc8
Compare
Squashing this is actually a little tricky because it straddles the merge and unmerge of babylon 7. Let me see what I can do. |
- Allow parsing of single statements in oneline whiteblocks - Change if and arrow parsing to call new subroutine - Allow ASI after existing semicolons and in other edge cases - Add class to startsExpr so `ClassExpression`s don’t get parsed as stmts - Added fixtures Block parsing cleanup - Move special parsing case for arrows into parseArrowBody - Break up parseWhiteBlockBody Whiteblocks don’t allow directives Revert changes adding `exprContext` to parsing functions Disallow empty whiteblocks (removes need for `couldBeginStatement`) - Disallow empty whiteblocks - Update fixtures - Remove some obsolete fixtures - Use idiomatic `-> return` in fixtures
ba10bc8
to
c9526a8
Compare
@rattrayalex OK, that was an ugly rebase but I think I got it right, should be squashed now. |
Woohoo! This was a heavy lift. Great work @wcjohnson and thanks for putting up with all my (sometimes ignorant) asks. |
Fixes lightscript/lightscript#32
ClassExpression
s don’t get parsed as stmts