UTF-8 vs UTF-16 mismatch in KCL diagnostic SourceRanges (#7537)

# Background

The KCL interpreter (written in Rust) reports errors, lints and other diagnostics as UTF-8 source range offsets. JavaScript, including CodeMirror, represents source code as UTF-16 strings. 

# Problem

This means the UTF-8 source ranges sent from Rust don't correspond to the same text in the JS UTF-16 GUI. At best, this means the source ranges highlight the wrong thing if you use non-ASCII characters. At worst, it can cause exceptions by trying to highlight a range that doesn't even exist.

Here's the problem, on main: 

<img width="178" alt="Screenshot 2025-06-20 at 11 52 03 AM" src="https://github.com/user-attachments/assets/9a4e75bf-965f-49d8-b238-8868b35912a1" />

# Solution

We define a KCL SourceRange as _always_ being UTF-8. This means if a source range is constructed in JS, it should be converted to UTF-8. When these ranges are converted into CodeMirror diagnostics, they should be converted to UTF-16.

Here's the same code as above, but on this branch:

<img width="170" alt="Screenshot 2025-06-20 at 11 50 55 AM" src="https://github.com/user-attachments/assets/a5b971c0-0b02-4acd-8fcf-5a133331682b" />

Closes https://github.com/KittyCAD/modeling-app/issues/4327
This commit is contained in:
Adam Chalmers
2025-06-20 14:04:49 -05:00
committed by GitHub
parent 416d0b37a2
commit 53d6613d0d
4 changed files with 105 additions and 33 deletions

View File

@ -111,7 +111,8 @@ commaSep1NoTrailingComma<term> { term ("," term)* }
PipeSubstitution { "%" }
identifier { (@asciiLetter | "_") (@asciiLetter | @digit | "_")* }
// Includes non-whitespace unicode characters.
identifier { $[a-zA-Z_\u{a1}-\u{167f}\u{1681}-\u{1fff}\u{200e}-\u{2027}\u{202a}-\u{202e}\u{2030}-\u{205e}\u{2061}-\u{2fff}\u{3001}-\u{fefe}\u{ff00}-\u{10ffff}] $[a-zA-Z0-9_\u{a1}-\u{167f}\u{1681}-\u{1fff}\u{200e}-\u{2027}\u{202a}-\u{202e}\u{2030}-\u{205e}\u{2061}-\u{2fff}\u{3001}-\u{fefe}\u{ff00}-\u{10ffff}]* }
AnnotationName { "@" identifier? }
PropertyName { identifier }
TagDeclarator { "$" identifier }