Check the winnow ParseError before indexing (#2491)

* Check the winnow ParseError before indexing

From the winnow docs at[1]

> The location in ParseError::input where parsing failed
>
> Note: This is an offset, not an index, and may point to the end of input
> (input.len()) on eof errors.

This will bounds check the index before slicing into the `input` vec,
and return an EOF erorr rather than an unknown token error.

[1]: https://docs.rs/winnow/latest/winnow/error/struct.ParseError.html#method.offset

I have a hunch somewhere something is mixing up bytes and chars (more
specifically, a codepoint or grapheme), which is causing bounds to go
past the end of the list since something is talking byte indexes and
the other is dealing with char/codepoint indexes.

For now this is going to fix the crash, but the EOF error may be masking
an actual bad token error in some cases? Our code looks right, so I'm
not quite sure what is going on in the winnow internals here.

Signed-off-by: Paul R. Tagliamonte <paul@kittycad.io>
This commit is contained in:
Paul Tagliamonte
2024-05-23 16:27:54 -04:00
committed by GitHub
parent 00a8273173
commit a69d7d03d0
3 changed files with 30 additions and 0 deletions

View File

@ -5,6 +5,7 @@ use crate::{
token::{Token, TokenType},
};
mod bad_inputs;
mod math;
pub(crate) mod parser_impl;

View File

@ -0,0 +1,17 @@
#[cfg(test)]
mod tests {
macro_rules! parse_and_lex {
($func_name:ident, $test_kcl_program:expr) => {
#[test]
fn $func_name() {
if let Ok(v) = $crate::token::lexer($test_kcl_program) {
let _ = $crate::parser::Parser::new(v).ast();
}
}
};
}
parse_and_lex!(crash_eof_1, "{\"ގގ\0\0\0\"\".");
parse_and_lex!(crash_eof_2, "(/=e\"\u{616}ݝ\"\"");
}

View File

@ -23,6 +23,18 @@ impl From<ParseError<Located<&str>, winnow::error::ContextError>> for KclError {
fn from(err: ParseError<Located<&str>, winnow::error::ContextError>) -> Self {
let (input, offset): (Vec<char>, usize) = (err.input().chars().collect(), err.offset());
if offset >= input.len() {
// From the winnow docs:
//
// This is an offset, not an index, and may point to
// the end of input (input.len()) on eof errors.
return KclError::Lexical(KclErrorDetails {
source_ranges: vec![SourceRange([offset, offset])],
message: "unexpected EOF while parsing".to_string(),
});
}
// TODO: Add the Winnow tokenizer context to the error.
// See https://github.com/KittyCAD/modeling-app/issues/784
let bad_token = &input[offset];