New parser built in Winnow (#731)

* New parser built with Winnow

This new parser uses [winnow](docs.rs/winnow) to replace the handwritten recursive parser.

## Differences

I think the Winnow parser is more readable than handwritten one, due to reusing standard combinators. If you have a parsre like `p` or `q` you can combine them with standard functions like `repeat(0..4, p)`, `opt(p)`, `alt((p, q))` and `separated1(p, ", ")`. This IMO makes it more readable once you know what those standard functions do.

It's also more accurate now -- e.g. the parser no longer swallows whitespace between comments, or inserts it where there was none before. It no longer changes // comments to /* comments depending on the surrounding whitespace. 

Primary form of testing was running the same KCL program through both the old and new parsers and asserting that both parsers produce the same AST. See the test `parser::parser_impl::tests::check_parsers_work_the_same`. But occasionally the new and old parsers disagree. This is either:

- Innocuous (e.g. disagreeing on whether a comment starts at the preceding whitespace or at the //)
- Helpful (e.g. new parser recognizes comments more accurately, preserving the difference between // and /* comments)
- Acceptably bad (e.g. new parser sometimes outputs worse error messages, TODO in #784)

so those KCL programs have their own unit tests in `parser_impl.rs` demonstrating the behaviour.

If you'd like to review this PR, it's arguably more important to review changes to the existing unit tests rather than the new parser itself. Because changes to the unit tests show where my parser changes behaviour -- usually for the better, occasionally for the worse (e.g. a worse error message than before). I think overall the improvements are worth it that I'd like to merge it without spending another week fixing it up -- we can fix the error messages in a follow-up PR.

## Performance

| Benchmark | Old parser (this branch) | New parser (this branch) | Speedup |
| ------------- | ------------- | ------------- | ------------- |
| Pipes on pipes  | 922 ms  | 42 ms | 21x |
| Kitt SVG  | 148 ms  | 7 ms | 21x |

There's definitely still room to improve performance:

- https://github.com/KittyCAD/modeling-app/issues/839
- https://github.com/KittyCAD/modeling-app/issues/840

## Winnow

Y'all know I love [Nom](docs.rs/nom) and I've blogged about it a lot. But I'm very happy using Winnow, a fork. It's got some really nice usability improvements. While writing this PR I found some bugs or unclear docs in Winnow:

- https://github.com/winnow-rs/winnow/issues/339
- https://github.com/winnow-rs/winnow/issues/341
- https://github.com/winnow-rs/winnow/issues/342
- https://github.com/winnow-rs/winnow/issues/344

The maintainer was quick to close them and release new versions within a few hours, so I feel very confident building the parser on this library. It's a clear improvement over Nom and it's used in toml-edit (and therefore within Cargo) and Gitoxide, so it's becoming a staple of the Rust ecosystem, which adds confidence.

Closes #716 
Closes #815 
Closes #599
This commit is contained in:
Adam Chalmers
2023-10-12 09:42:37 -05:00
committed by GitHub
parent 5b90686e5e
commit 2a02f6e039
17 changed files with 2475 additions and 195 deletions

View File

@ -141,42 +141,6 @@ const newVar = myVar + 1
})
describe('testing function declaration', () => {
test('fn funcN = () => {}', () => {
const { body } = parse('fn funcN = () => {}')
delete (body[0] as any).declarations[0].init.body.nonCodeMeta
expect(body).toEqual([
{
type: 'VariableDeclaration',
start: 0,
end: 19,
kind: 'fn',
declarations: [
{
type: 'VariableDeclarator',
start: 3,
end: 19,
id: {
type: 'Identifier',
start: 3,
end: 8,
name: 'funcN',
},
init: {
type: 'FunctionExpression',
start: 11,
end: 19,
params: [],
body: {
start: 17,
end: 19,
body: [],
},
},
},
],
},
])
})
test('fn funcN = (a, b) => {return a + b}', () => {
const { body } = parse(
['fn funcN = (a, b) => {', ' return a + b', '}'].join('\n')
@ -1513,22 +1477,23 @@ const key = 'c'`
const nonCodeMetaInstance = {
type: 'NonCodeNode',
start: code.indexOf('\n// this is a comment'),
end: code.indexOf('const key'),
end: code.indexOf('const key') - 1,
value: {
type: 'blockComment',
style: 'line',
value: 'this is a comment',
},
}
const { nonCodeMeta } = parse(code)
expect(nonCodeMeta.nonCodeNodes[0]).toEqual(nonCodeMetaInstance)
expect(nonCodeMeta.nonCodeNodes[0][0]).toEqual(nonCodeMetaInstance)
// extra whitespace won't change it's position (0) or value (NB the start end would have changed though)
const codeWithExtraStartWhitespace = '\n\n\n' + code
const { nonCodeMeta: nonCodeMeta2 } = parse(codeWithExtraStartWhitespace)
expect(nonCodeMeta2.nonCodeNodes[0].value).toStrictEqual(
expect(nonCodeMeta2.nonCodeNodes[0][0].value).toStrictEqual(
nonCodeMetaInstance.value
)
expect(nonCodeMeta2.nonCodeNodes[0].start).not.toBe(
expect(nonCodeMeta2.nonCodeNodes[0][0].start).not.toBe(
nonCodeMetaInstance.start
)
})
@ -1546,12 +1511,13 @@ const key = 'c'`
const indexOfSecondLineToExpression = 2
const sketchNonCodeMeta = (body as any)[0].declarations[0].init.nonCodeMeta
.nonCodeNodes
expect(sketchNonCodeMeta[indexOfSecondLineToExpression]).toEqual({
expect(sketchNonCodeMeta[indexOfSecondLineToExpression][0]).toEqual({
type: 'NonCodeNode',
start: 106,
end: 166,
end: 163,
value: {
type: 'blockComment',
type: 'inlineComment',
style: 'block',
value: 'this is\n a comment\n spanning a few lines',
},
})
@ -1568,14 +1534,15 @@ const key = 'c'`
const { body } = parse(code)
const sketchNonCodeMeta = (body[0] as any).declarations[0].init.nonCodeMeta
.nonCodeNodes
expect(sketchNonCodeMeta[3]).toEqual({
.nonCodeNodes[3][0]
expect(sketchNonCodeMeta).toEqual({
type: 'NonCodeNode',
start: 125,
end: 141,
end: 138,
value: {
type: 'blockComment',
value: 'a comment',
style: 'line',
},
})
})
@ -1693,11 +1660,7 @@ describe('parsing errors', () => {
}
const theError = _theError as any
expect(theError).toEqual(
new KCLError(
'unexpected',
'Unexpected token Token { token_type: Brace, start: 29, end: 30, value: "}" }',
[[29, 30]]
)
new KCLError('syntax', 'Unexpected token', [[27, 28]])
)
})
})