Fix search highlight for non-unicode chars

The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function returns the number of matching grapheme clusters instead of bytes. Should this function be renamed then? Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched
2025-09-18 02:36:24 +00:00 · 2021-12-17 22:53:34 +05:30
parent 559e019de1
commit 30247d70cd
3 changed files with 26 additions and 12 deletions
--- a/http-ui/Cargo.toml
+++ b/http-ui/Cargo.toml
@ -17,6 +17,7 @@ once_cell = "1.5.2"
 rayon = "1.5.0"
 structopt = { version = "0.3.21", default-features = false, features = ["wrap_help"] }
 tempfile = "3.2.0"
+unicode-segmentation = "1.6.0"

 # http server
 askama = "0.10.5"