UTF-8 vs UTF-16 mismatch in KCL diagnostic SourceRanges (#7537)
# Background The KCL interpreter (written in Rust) reports errors, lints and other diagnostics as UTF-8 source range offsets. JavaScript, including CodeMirror, represents source code as UTF-16 strings. # Problem This means the UTF-8 source ranges sent from Rust don't correspond to the same text in the JS UTF-16 GUI. At best, this means the source ranges highlight the wrong thing if you use non-ASCII characters. At worst, it can cause exceptions by trying to highlight a range that doesn't even exist. Here's the problem, on main: <img width="178" alt="Screenshot 2025-06-20 at 11 52 03 AM" src="https://github.com/user-attachments/assets/9a4e75bf-965f-49d8-b238-8868b35912a1" /> # Solution We define a KCL SourceRange as _always_ being UTF-8. This means if a source range is constructed in JS, it should be converted to UTF-8. When these ranges are converted into CodeMirror diagnostics, they should be converted to UTF-16. Here's the same code as above, but on this branch: <img width="170" alt="Screenshot 2025-06-20 at 11 50 55 AM" src="https://github.com/user-attachments/assets/a5b971c0-0b02-4acd-8fcf-5a133331682b" /> Closes https://github.com/KittyCAD/modeling-app/issues/4327
This commit is contained in:
@ -111,7 +111,8 @@ commaSep1NoTrailingComma<term> { term ("," term)* }
|
||||
|
||||
PipeSubstitution { "%" }
|
||||
|
||||
identifier { (@asciiLetter | "_") (@asciiLetter | @digit | "_")* }
|
||||
// Includes non-whitespace unicode characters.
|
||||
identifier { $[a-zA-Z_\u{a1}-\u{167f}\u{1681}-\u{1fff}\u{200e}-\u{2027}\u{202a}-\u{202e}\u{2030}-\u{205e}\u{2061}-\u{2fff}\u{3001}-\u{fefe}\u{ff00}-\u{10ffff}] $[a-zA-Z0-9_\u{a1}-\u{167f}\u{1681}-\u{1fff}\u{200e}-\u{2027}\u{202a}-\u{202e}\u{2030}-\u{205e}\u{2061}-\u{2fff}\u{3001}-\u{fefe}\u{ff00}-\u{10ffff}]* }
|
||||
AnnotationName { "@" identifier? }
|
||||
PropertyName { identifier }
|
||||
TagDeclarator { "$" identifier }
|
||||
|
||||
Reference in New Issue
Block a user