Merge remote-tracking branch 'origin/release-v1.3.1' into japanese-docker-image

Merge #3976
3976: Fix the get stats method r=ManyTheFish a=irevoire # Pull Request - The get stats method of the index-scheduler was not using at all the processing tasks. That was returning a wrong number of enqueued tasks and 0 processing tasks. - Added a test - Currently this method was **ONLY** used to compute the `meilisearch_nb_tasks` field of the **experimental feature** metrics. ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3972 Co-authored-by: Tamo <tamo@meilisearch.com>
2025-12-04 11:45:44 +00:00 · 2023-08-10 13:59:33 +02:00 · 2023-08-10 10:55:50 +00:00 · 2023-08-09 07:58:15 +00:00 · 2023-08-08 15:07:07 +00:00 · 2023-08-08 16:58:14 +02:00
65 changed files with 642 additions and 2033 deletions
--- a/.github/workflows/publish-apt-brew-pkg.yml
+++ b/.github/workflows/publish-apt-brew-pkg.yml
@@ -35,7 +35,7 @@ jobs:
    - name: Build deb package
      run: cargo deb -p meilisearch -o target/debian/meilisearch.deb
    - name: Upload debian pkg to release
-      uses: svenstaro/upload-release-action@2.7.0
+      uses: svenstaro/upload-release-action@2.6.1
      with:
        repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
        file: target/debian/meilisearch.deb
--- a/.github/workflows/publish-binaries.yml
+++ b/.github/workflows/publish-binaries.yml
@@ -54,7 +54,7 @@ jobs:
    # No need to upload binaries for dry run (cron)
    - name: Upload binaries to release
      if: github.event_name == 'release'
-      uses: svenstaro/upload-release-action@2.7.0
+      uses: svenstaro/upload-release-action@2.6.1
      with:
        repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
        file: target/release/meilisearch
@@ -87,7 +87,7 @@ jobs:
    # No need to upload binaries for dry run (cron)
    - name: Upload binaries to release
      if: github.event_name == 'release'
-      uses: svenstaro/upload-release-action@2.7.0
+      uses: svenstaro/upload-release-action@2.6.1
      with:
        repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
        file: target/release/${{ matrix.artifact_name }}
@@ -121,7 +121,7 @@ jobs:
      - name: Upload the binary to release
        # No need to upload binaries for dry run (cron)
        if: github.event_name == 'release'
-        uses: svenstaro/upload-release-action@2.7.0
+        uses: svenstaro/upload-release-action@2.6.1
        with:
          repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
          file: target/${{ matrix.target }}/release/meilisearch
@@ -183,7 +183,7 @@ jobs:
      - name: Upload the binary to release
        # No need to upload binaries for dry run (cron)
        if: github.event_name == 'release'
-        uses: svenstaro/upload-release-action@2.7.0
+        uses: svenstaro/upload-release-action@2.6.1
        with:
          repo_token: ${{ secrets.MEILI_BOT_GH_PAT }}
          file: target/${{ matrix.target }}/release/meilisearch
--- a/.github/workflows/test-suite.yml
+++ b/.github/workflows/test-suite.yml
@@ -30,20 +30,20 @@ jobs:
        run: |
          apt-get update && apt-get install -y curl
          apt-get install build-essential -y
-      - name: Setup test with Rust stable
+      - name: Run test with Rust stable
        if: github.event_name != 'schedule'
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
          override: true
-      - name: Setup test with Rust nightly
+      - name: Run test with Rust nightly
        if: github.event_name == 'schedule'
        uses: actions-rs/toolchain@v1
        with:
          toolchain: nightly
          override: true
      - name: Cache dependencies
-        uses: Swatinem/rust-cache@v2.5.1
+        uses: Swatinem/rust-cache@v2.4.0
      - name: Run cargo check without any default features
        uses: actions-rs/cargo@v1
        with:
@@ -65,7 +65,7 @@ jobs:
    steps:
      - uses: actions/checkout@v3
      - name: Cache dependencies
-        uses: Swatinem/rust-cache@v2.5.1
+        uses: Swatinem/rust-cache@v2.4.0
      - name: Run cargo check without any default features
        uses: actions-rs/cargo@v1
        with:
@@ -117,17 +117,17 @@ jobs:
        run: |
          apt-get update
          apt-get install --assume-yes build-essential curl
-      - uses: actions-rs/toolchain@v1
-        with:
-          toolchain: stable
-          override: true
+      - uses: actions-rs/toolchain@v1 
+        with: 
+          toolchain: stable 
+          override: true 
      - name: Run cargo tree without default features and check lindera is not present
        run: |
          cargo tree -f '{p} {f}' -e normal --no-default-features | grep lindera -vqz
      - name: Run cargo tree with default features and check lindera is pressent
        run: |
          cargo tree -f '{p} {f}' -e normal | grep lindera -qz
-
+                
  # We run tests in debug also, to make sure that the debug_assertions are hit
  test-debug:
    name: Run tests in debug
@@ -146,7 +146,7 @@ jobs:
          toolchain: stable
          override: true
      - name: Cache dependencies
-        uses: Swatinem/rust-cache@v2.5.1
+        uses: Swatinem/rust-cache@v2.4.0
      - name: Run tests in debug
        uses: actions-rs/cargo@v1
        with:
@@ -165,7 +165,7 @@ jobs:
          override: true
          components: clippy
      - name: Cache dependencies
-        uses: Swatinem/rust-cache@v2.5.1
+        uses: Swatinem/rust-cache@v2.4.0
      - name: Run cargo clippy
        uses: actions-rs/cargo@v1
        with:
@@ -184,7 +184,7 @@ jobs:
          override: true
          components: rustfmt
      - name: Cache dependencies
-        uses: Swatinem/rust-cache@v2.5.1
+        uses: Swatinem/rust-cache@v2.4.0
      - name: Run cargo fmt
        # Since we never ran the `build.rs` script in the benchmark directory we are missing one auto-generated import file.
        # Since we want to trigger (and fail) this action as fast as possible, instead of building the benchmark crate
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -405,7 +405,7 @@ checksum = "16e62a023e7c117e27523144c5d2459f4397fcc3cab0085af8e2224f643a0193"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -416,7 +416,7 @@ checksum = "b9ccdd8f2a161be9bd5c023df56f1b2a0bd1d83872ae53b71a84a12c9bf6e842"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -469,7 +469,7 @@ checksum = "8c3c1a368f70d6cf7302d78f8f7093da241fb8e8807c05cc9e51a125895a6d5b"

 [[package]]
 name = "benchmarks"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "anyhow",
 "bytes",
@@ -603,7 +603,7 @@ checksum = "fdde5c9cd29ebd706ce1b35600920a33550e402fc998a2e53ad3b42c3c47a192"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -794,7 +794,7 @@ dependencies = [
 "heck",
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -1199,7 +1199,7 @@ dependencies = [

 [[package]]
 name = "dump"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "anyhow",
 "big_s",
@@ -1336,7 +1336,7 @@ checksum = "eecf8589574ce9b895052fa12d69af7a233f99e6107f5cb8dd1044f2a17bfdcb"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -1413,7 +1413,7 @@ dependencies = [

 [[package]]
 name = "file-store"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "faux",
 "tempfile",
@@ -1435,7 +1435,7 @@ dependencies = [

 [[package]]
 name = "filter-parser"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "insta",
 "nom",
@@ -1454,7 +1454,7 @@ dependencies = [

 [[package]]
 name = "flatten-serde-json"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "criterion",
 "serde_json",
@@ -1537,7 +1537,7 @@ checksum = "89ca545a94061b6365f2c7355b4b32bd20df3ff95f02da9329b34ccc3bd6ee72"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -1572,7 +1572,7 @@ dependencies = [

 [[package]]
 name = "fuzzers"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "arbitrary",
 "clap",
@@ -1894,7 +1894,7 @@ dependencies = [

 [[package]]
 name = "index-scheduler"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "anyhow",
 "big_s",
@@ -1912,7 +1912,6 @@ dependencies = [
 "meilisearch-types",
 "nelson",
 "page_size 0.5.0",
- "puffin",
 "roaring",
 "serde",
 "serde_json",
@@ -2023,12 +2022,12 @@ dependencies = [

 [[package]]
 name = "is-terminal"
-version = "0.4.9"
+version = "0.4.8"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "cb0889898416213fab133e1d33a0e5858a48177452750691bde3666d0fdbaf8b"
+checksum = "24fddda5af7e54bf7da53067d6e802dbcc381d0a8eef629df528e3ebf68755cb"
 dependencies = [
 "hermit-abi 0.3.1",
- "rustix 0.38.4",
+ "rustix 0.38.2",
 "windows-sys 0.48.0",
 ]

@@ -2082,7 +2081,7 @@ dependencies = [

 [[package]]
 name = "json-depth-checker"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "criterion",
 "serde_json",
@@ -2398,9 +2397,9 @@ checksum = "ef53942eb7bf7ff43a617b3e2c1c4a5ecf5944a7c1bc12d7ee39bbb15e5c1519"

 [[package]]
 name = "linux-raw-sys"
-version = "0.4.5"
+version = "0.4.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "57bcfdad1b858c2db7c38303a6d2ad4dfaf5eb53dfeb0910128b2c26d6158503"
+checksum = "09fc20d2ca12cb9f044c93e3bd6d32d523e6e2ec3db4f7b2939cd99026ecd3f0"

 [[package]]
 name = "lmdb-rkv-sys"
@@ -2468,12 +2467,6 @@ dependencies = [
 "syn 1.0.109",
 ]

-[[package]]
-name = "lz4_flex"
-version = "0.10.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "8b8c72594ac26bfd34f2d99dfced2edfaddfe8a476e3ff2ca0eb293d925c4f83"
-
 [[package]]
 name = "manifest-dir-macros"
 version = "0.1.17"
@@ -2483,7 +2476,7 @@ dependencies = [
 "once_cell",
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -2500,7 +2493,7 @@ checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"

 [[package]]
 name = "meili-snap"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "insta",
 "md5",
@@ -2509,7 +2502,7 @@ dependencies = [

 [[package]]
 name = "meilisearch"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "actix-cors",
 "actix-http",
@@ -2563,8 +2556,6 @@ dependencies = [
 "pin-project-lite",
 "platform-dirs",
 "prometheus",
- "puffin",
- "puffin_http",
 "rand",
 "rayon",
 "regex",
@@ -2600,7 +2591,7 @@ dependencies = [

 [[package]]
 name = "meilisearch-auth"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "base64 0.21.2",
 "enum-iterator",
@@ -2619,7 +2610,7 @@ dependencies = [

 [[package]]
 name = "meilisearch-types"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "actix-web",
 "anyhow",
@@ -2673,7 +2664,7 @@ dependencies = [

 [[package]]
 name = "milli"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "big_s",
 "bimap",
@@ -2709,7 +2700,6 @@ dependencies = [
 "obkv",
 "once_cell",
 "ordered-float",
- "puffin",
 "rand",
 "rand_pcg",
 "rayon",
@@ -3004,7 +2994,7 @@ checksum = "478c572c3d73181ff3c2539045f6eb99e5491218eae919370993b890cdbdd98e"

 [[package]]
 name = "permissive-json-pointer"
-version = "1.3.0"
+version = "1.3.1"
 dependencies = [
 "big_s",
 "serde_json",
@@ -3040,7 +3030,7 @@ dependencies = [
 "pest_meta",
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -3185,9 +3175,9 @@ dependencies = [

 [[package]]
 name = "proc-macro2"
-version = "1.0.64"
+version = "1.0.66"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "78803b62cbf1f46fde80d7c0e803111524b9877184cfe7c3033659490ac7a7da"
+checksum = "18fb31db3f9bddb2ea821cde30a9f70117e3f119938b5ee630b7403aa6e2ead9"
 dependencies = [
 "unicode-ident",
 ]
@@ -3228,40 +3218,11 @@ version = "2.28.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "106dd99e98437432fed6519dedecfade6a06a73bb7b2a1e019fdd2bee5778d94"

-[[package]]
-name = "puffin"
-version = "0.16.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "76425abd4e1a0ad4bd6995dd974b52f414fca9974171df8e3708b3e660d05a21"
-dependencies = [
- "anyhow",
- "bincode",
- "byteorder",
- "cfg-if",
- "instant",
- "lz4_flex",
- "once_cell",
- "parking_lot",
- "serde",
-]
-
-[[package]]
-name = "puffin_http"
-version = "0.13.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "13bffc600c35913d282ae1e96a6ffcdf36dc7a7cdb9310e0ba15914d258c8193"
-dependencies = [
- "anyhow",
- "crossbeam-channel",
- "log",
- "puffin",
-]
-
 [[package]]
 name = "quote"
-version = "1.0.30"
+version = "1.0.31"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "5907a1b7c277254a8b15170f6e7c97cfa60ee7872a3217663bb81151e48184bb"
+checksum = "5fe8a65d69dd0808184ebb5f836ab526bb259db23c657efa38711b1072ee47f0"
 dependencies = [
 "proc-macro2",
 ]
@@ -3509,14 +3470,14 @@ dependencies = [

 [[package]]
 name = "rustix"
-version = "0.38.4"
+version = "0.38.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0a962918ea88d644592894bc6dc55acc6c0956488adcebbfb6e273506b7fd6e5"
+checksum = "aabcb0461ebd01d6b79945797c27f8529082226cb630a9865a71870ff63532a4"
 dependencies = [
 "bitflags 2.3.3",
 "errno",
 "libc",
- "linux-raw-sys 0.4.5",
+ "linux-raw-sys 0.4.3",
 "windows-sys 0.48.0",
 ]

@@ -3622,9 +3583,9 @@ checksum = "bebd363326d05ec3e2f532ab7660680f3b02130d780c299bca73469d521bc0ed"

 [[package]]
 name = "serde"
-version = "1.0.180"
+version = "1.0.171"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0ea67f183f058fe88a4e3ec6e2788e003840893b91bac4559cabedd00863b3ed"
+checksum = "30e27d1e4fd7659406c492fd6cfaf2066ba8773de45ca75e855590f856dc34a9"
 dependencies = [
 "serde_derive",
 ]
@@ -3649,20 +3610,20 @@ dependencies = [

 [[package]]
 name = "serde_derive"
-version = "1.0.180"
+version = "1.0.171"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "24e744d7782b686ab3b73267ef05697159cc0e5abbed3f47f9933165e5219036"
+checksum = "389894603bd18c46fa56231694f8d827779c0951a667087194cf9de94ed24682"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
 name = "serde_json"
-version = "1.0.104"
+version = "1.0.103"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "076066c5f1078eac5b722a31827a8832fe108bed65dfa75e233c89f8206e976c"
+checksum = "d03b412469450d4404fe8499a268edd7f8b79fecb074b0d812ad64ca21f4031b"
 dependencies = [
 "indexmap 2.0.0",
 "itoa",
@@ -3872,9 +3833,9 @@ dependencies = [

 [[package]]
 name = "syn"
-version = "2.0.28"
+version = "2.0.26"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "04361975b3f5e348b2189d8dc55bc942f278b2d482a6a0365de5bdd62d351567"
+checksum = "45c3457aacde3c65315de5031ec191ce46604304d2446e803d71ade03308d970"
 dependencies = [
 "proc-macro2",
 "quote",
@@ -3961,22 +3922,22 @@ dependencies = [

 [[package]]
 name = "thiserror"
-version = "1.0.44"
+version = "1.0.43"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "611040a08a0439f8248d1990b111c95baa9c704c805fa1f62104b39655fd7f90"
+checksum = "a35fc5b8971143ca348fa6df4f024d4d55264f3468c71ad1c2f365b0a4d58c42"
 dependencies = [
 "thiserror-impl",
 ]

 [[package]]
 name = "thiserror-impl"
-version = "1.0.44"
+version = "1.0.43"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "090198534930841fab3a5d1bb637cde49e339654e606195f8d9c76eeb081dc96"
+checksum = "463fe12d7993d3b327787537ce8dd4dfa058de32fc2b195ef3cde03dc4771e8f"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -4058,7 +4019,7 @@ checksum = "630bdcf245f78637c13ec01ffae6187cca34625e8c63150d424b59e55af2675e"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 ]

 [[package]]
@@ -4383,7 +4344,7 @@ dependencies = [
 "once_cell",
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 "wasm-bindgen-shared",
 ]

@@ -4417,7 +4378,7 @@ checksum = "e128beba882dd1eb6200e1dc92ae6c5dbaa4311aa7bb211ca035779e5efc39f8"
 dependencies = [
 "proc-macro2",
 "quote",
- "syn 2.0.28",
+ "syn 2.0.26",
 "wasm-bindgen-backend",
 "wasm-bindgen-shared",
 ]
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -18,7 +18,7 @@ members = [
 ]

 [workspace.package]
-version = "1.3.0"
+version = "1.3.1"
 authors = ["Quentin de Quelen <quentin@dequelen.me>", "Clément Renault <clement@meilisearch.com>"]
 description = "Meilisearch HTTP server"
 homepage = "https://meilisearch.com"
--- a/2
+++ b/2
@@ -17,7 +17,7 @@ RUN     set -eux; \
        if [ "$apkArch" = "aarch64" ]; then \
            export JEMALLOC_SYS_WITH_LG_PAGE=16; \
        fi && \
-        cargo build --release
+        cargo build --release --no-default-features --features "analytics mini-dashboard japanese"

 # Run
 FROM    alpine:3.16
--- a/PROFILING.md
+++ b/PROFILING.md
@@ -1,19 +0,0 @@
-# Profiling Meilisearch
-
-Search engine technologies are complex pieces of software that require thorough profiling tools. We chose to use [Puffin](https://github.com/EmbarkStudios/puffin), which the Rust gaming industry uses extensively. You can export and import the profiling reports using the top bar's _File_ menu options.
-
-![An example profiling with Puffin viewer](assets/profiling-example.png)
-
-## Profiling the Indexing Process
-
-When you enable the `profile-with-puffin` feature of Meilisearch, a Puffin HTTP server will run on Meilisearch and listen on the default _0.0.0.0:8585_ address. This server will record a "frame" whenever it executes the `IndexScheduler::tick` method.
-
-Once your Meilisearch is running and awaits new indexation operations, you must [install and run the `puffin_viewer` tool](https://github.com/EmbarkStudios/puffin/tree/main/puffin_viewer) to see the profiling results. I advise you to run the viewer with the `RUST_LOG=puffin_http::client=debug` environment variable to see the client trying to connect to your server.
-
-Another piece of advice on the Puffin viewer UI interface is to consider the _Merge children with same ID_ option. It can hide the exact actual timings at which events were sent. Please turn it off when you see strange gaps on the Flamegraph. It can help.
-
-## Profiling the Search Process
-
-We still need to take the time to profile the search side of the engine with Puffin. It would require time to profile the filtering phase, query parsing, creation, and execution. We could even profile the Actix HTTP server.
-
-The only issue we see is the framing system. Puffin requires a global frame-based profiling phase, which collides with Meilisearch's ability to accept and answer multiple requests on different threads simultaneously.
--- a/README.md
+++ b/README.md
@@ -1,20 +1,16 @@
 <p align="center">
-  <a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-light-mode-only" target="_blank">
-    <img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
-  </a>
-  <a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=logo#gh-dark-mode-only" target="_blank">
-    <img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
-  </a>
+  <img src="assets/meilisearch-logo-light.svg?sanitize=true#gh-light-mode-only">
+  <img src="assets/meilisearch-logo-dark.svg?sanitize=true#gh-dark-mode-only">
 </p>

 <h4 align="center">
-  <a href="https://www.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Website</a> |
+  <a href="https://www.meilisearch.com">Website</a> |
  <a href="https://roadmap.meilisearch.com/tabs/1-under-consideration">Roadmap</a> |
-  <a href="https://www.meilisearch.com/pricing?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Meilisearch Cloud</a> |
-  <a href="https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Blog</a> |
-  <a href="https://www.meilisearch.com/docs?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Documentation</a> |
-  <a href="https://www.meilisearch.com/docs/faq?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">FAQ</a> |
-  <a href="https://discord.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=nav">Discord</a>
+  <a href="https://www.meilisearch.com/pricing?utm_campaign=oss&utm_source=engine&utm_medium=meilisearch">Meilisearch Cloud</a> |
+  <a href="https://blog.meilisearch.com">Blog</a> |
+  <a href="https://www.meilisearch.com/docs">Documentation</a> |
+  <a href="https://www.meilisearch.com/docs/faq">FAQ</a> |
+  <a href="https://discord.meilisearch.com">Discord</a>
 </h4>

 <p align="center">
@@ -28,40 +24,40 @@
 Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.

 <p align="center" name="demo">
-  <a href="https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demo-gif#gh-light-mode-only" target="_blank">
+  <a href="https://where2watch.meilisearch.com/#gh-light-mode-only" target="_blank">
    <img src="assets/demo-light.gif#gh-light-mode-only" alt="A bright colored application for finding movies screening near the user">
  </a>
-  <a href="https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demo-gif#gh-dark-mode-only" target="_blank">
+  <a href="https://where2watch.meilisearch.com/#gh-dark-mode-only" target="_blank">
    <img src="assets/demo-dark.gif#gh-dark-mode-only" alt="A dark colored application for finding movies screening near the user">
  </a>
 </p>

-🔥 [**Try it!**](https://where2watch.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=demo-link) 🔥
+🔥 [**Try it!**](https://where2watch.meilisearch.com/) 🔥

 ## ✨ Features

 - **Search-as-you-type:** find search results in less than 50 milliseconds
- **[Typo tolerance](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features#typo-tolerance):** get relevant matches even when queries contain typos and misspellings
- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
- **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** sort results based on price, date, or pretty much anything else your users need
- **[Synonym support](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features#synonyms):** configure synonyms to include more relevant content in your search results
- **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** filter and sort documents based on geographic data
- **[Extensive language support](https://www.meilisearch.com/docs/learn/what_is_meilisearch/language?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- **[Security management](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** control which users can access what data with API keys that allow fine-grained permissions handling
- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** personalize search results for any number of application tenants
+- **[Typo tolerance](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy#typo-tolerance):** get relevant matches even when queries contain typos and misspellings
+- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search):** enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
+- **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting):** sort results based on price, date, or pretty much anything else your users need
+- **[Synonym support](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy#synonyms):** configure synonyms to include more relevant content in your search results
+- **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch):** filter and sort documents based on geographic data
+- **[Extensive language support](https://www.meilisearch.com/docs/learn/what_is_meilisearch/language):** search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
+- **[Security management](https://www.meilisearch.com/docs/learn/security/master_api_keys):** control which users can access what data with API keys that allow fine-grained permissions handling
+- **[Multi-Tenancy](https://www.meilisearch.com/docs/learn/security/tenant_tokens):** personalize search results for any number of application tenants
 - **Highly Customizable:** customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** integrate Meilisearch in your technical stack with our plugins and SDKs
+- **[RESTful API](https://www.meilisearch.com/docs/reference/api/overview):** integrate Meilisearch in your technical stack with our plugins and SDKs
 - **Easy to install, deploy, and maintain**

 ## 📖 Documentation

-You can consult Meilisearch's documentation at [https://www.meilisearch.com/docs](https://www.meilisearch.com/docs/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=docs).
+You can consult Meilisearch's documentation at [https://www.meilisearch.com/docs](https://www.meilisearch.com/docs/).

 ## 🚀 Getting started

-For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our [Quick Start](https://www.meilisearch.com/docs/learn/getting_started/quick_start?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=get-started) guide.
+For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our [Quick Start](https://www.meilisearch.com/docs/learn/getting_started/quick_start) guide.

-You may also want to check out [Meilisearch 101](https://www.meilisearch.com/docs/learn/getting_started/filtering_and_sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=get-started) for an introduction to some of Meilisearch's most popular features.
+You may also want to check out [Meilisearch 101](https://www.meilisearch.com/docs/learn/getting_started/filtering_and_sorting) for an introduction to some of Meilisearch's most popular features.

 ## ⚡ Supercharge your Meilisearch experience

@@ -71,29 +67,29 @@ Say goodbye to server deployment and manual updates with [Meilisearch Cloud](htt

 Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!

-Take a look at the complete [Meilisearch integration list](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-link).
+Take a look at the complete [Meilisearch integration list](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks).

-[![Logos belonging to different languages and frameworks supported by Meilisearch, including React, Ruby on Rails, Go, Rust, and PHP](assets/integrations.png)](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=sdks-logos)
+[![Logos belonging to different languages and frameworks supported by Meilisearch, including React, Ruby on Rails, Go, Rust, and PHP](assets/integrations.png)](https://www.meilisearch.com/docs/learn/what_is_meilisearch/sdks)

 ## ⚙️ Advanced usage

-Experienced users will want to keep our [API Reference](https://www.meilisearch.com/docs/reference/api/overview?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) close at hand.
+Experienced users will want to keep our [API Reference](https://www.meilisearch.com/docs/reference/api/overview) close at hand.

-We also offer a wide range of dedicated guides to all Meilisearch features, such as [filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), [API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced), and [tenant tokens](https://www.meilisearch.com/docs/learn/security/tenant_tokens?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
+We also offer a wide range of dedicated guides to all Meilisearch features, such as [filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering), [sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting), [geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch), [API keys](https://www.meilisearch.com/docs/learn/security/master_api_keys), and [tenant tokens](https://www.meilisearch.com/docs/learn/security/tenant_tokens).

-Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://www.meilisearch.com/docs/learn/core_concepts/documents?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced) and [indexes](https://www.meilisearch.com/docs/learn/core_concepts/indexes?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=advanced).
+Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as [documents](https://www.meilisearch.com/docs/learn/core_concepts/documents) and [indexes](https://www.meilisearch.com/docs/learn/core_concepts/indexes).

 ## 📊 Telemetry

-Meilisearch collects **anonymized** data from users to help us improve our product. You can [deactivate this](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) whenever you want.
+Meilisearch collects **anonymized** data from users to help us improve our product. You can [deactivate this](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry#how-to-disable-data-collection) whenever you want.

-To request deletion of collected data, please write to us at [privacy@meilisearch.com](mailto:privacy@meilisearch.com). Don't forget to include your `Instance UID` in the message, as this helps us quickly find and delete your data.
+To request deletion of collected data, please write to us at [privacy@meilisearch.com](mailto:privacy@meilisearch.com). Don't forget to include your `Instance UID` in the message, as this helps us quickly find and delete your data.

-If you want to know more about the kind of data we collect and what we use it for, check the [telemetry section](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=telemetry#how-to-disable-data-collection) of our documentation.
+If you want to know more about the kind of data we collect and what we use it for, check the [telemetry section](https://www.meilisearch.com/docs/learn/what_is_meilisearch/telemetry) of our documentation.

 ## 📫 Get in touch!

-Meilisearch is a search engine created by [Meili](https://www.welcometothejungle.com/en/companies/meilisearch), a software development company based in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=contact)
+Meilisearch is a search engine created by [Meili](https://www.welcometothejungle.com/en/companies/meilisearch), a software development company based in France and with team members all over the world. Want to know more about us? [Check out our blog!](https://blog.meilisearch.com/)

 🗞 [Subscribe to our newsletter](https://meilisearch.us2.list-manage.com/subscribe?u=27870f7b71c908a8b359599fb&id=79582d828e) if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.

--- a/assets/profiling-example.png
+++ b/assets/profiling-example.png
--- a/benchmarks/Cargo.toml
+++ b/benchmarks/Cargo.toml
@@ -14,7 +14,7 @@ license.workspace = true
 anyhow = "1.0.70"
 csv = "1.2.1"
 milli = { path = "../milli" }
-mimalloc = { version = "0.1.37", default-features = false }
+mimalloc = { version = "0.1.36", default-features = false }
 serde_json = { version = "1.0.95", features = ["preserve_order"] }

 [dev-dependencies]
--- a/dump/src/lib.rs
+++ b/dump/src/lib.rs
@@ -262,9 +262,6 @@ pub(crate) mod test {
            sortable_attributes: Setting::Set(btreeset! { S("age") }),
            ranking_rules: Setting::NotSet,
            stop_words: Setting::NotSet,
-            non_separator_tokens: Setting::NotSet,
-            separator_tokens: Setting::NotSet,
-            dictionary: Setting::NotSet,
            synonyms: Setting::NotSet,
            distinct_attribute: Setting::NotSet,
            typo_tolerance: Setting::NotSet,
--- a/dump/src/reader/compat/v5_to_v6.rs
+++ b/dump/src/reader/compat/v5_to_v6.rs
@@ -340,9 +340,6 @@ impl<T> From<v5::Settings<T>> for v6::Settings<v6::Unchecked> {
                }
            },
            stop_words: settings.stop_words.into(),
-            non_separator_tokens: v6::Setting::NotSet,
-            separator_tokens: v6::Setting::NotSet,
-            dictionary: v6::Setting::NotSet,
            synonyms: settings.synonyms.into(),
            distinct_attribute: settings.distinct_attribute.into(),
            typo_tolerance: match settings.typo_tolerance {
--- a/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-10.snap
+++ b/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-10.snap
@@ -0,0 +1,24 @@
+---
+source: dump/src/reader/mod.rs
+expression: spells.settings().unwrap()
+---
+{
+  "displayedAttributes": [
+    "*"
+  ],
+  "searchableAttributes": [
+    "*"
+  ],
+  "filterableAttributes": [],
+  "sortableAttributes": [],
+  "rankingRules": [
+    "typo",
+    "words",
+    "proximity",
+    "attribute",
+    "exactness"
+  ],
+  "stopWords": [],
+  "synonyms": {},
+  "distinctAttribute": null
+}
--- a/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-4.snap
+++ b/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-4.snap
@@ -0,0 +1,38 @@
+---
+source: dump/src/reader/mod.rs
+expression: products.settings().unwrap()
+---
+{
+  "displayedAttributes": [
+    "*"
+  ],
+  "searchableAttributes": [
+    "*"
+  ],
+  "filterableAttributes": [],
+  "sortableAttributes": [],
+  "rankingRules": [
+    "typo",
+    "words",
+    "proximity",
+    "attribute",
+    "exactness"
+  ],
+  "stopWords": [],
+  "synonyms": {
+    "android": [
+      "phone",
+      "smartphone"
+    ],
+    "iphone": [
+      "phone",
+      "smartphone"
+    ],
+    "phone": [
+      "android",
+      "iphone",
+      "smartphone"
+    ]
+  },
+  "distinctAttribute": null
+}
--- a/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-7.snap
+++ b/dump/src/reader/snapshots/dumpreadertest__import_dump_v1-7.snap
@@ -0,0 +1,31 @@
+---
+source: dump/src/reader/mod.rs
+expression: movies.settings().unwrap()
+---
+{
+  "displayedAttributes": [
+    "*"
+  ],
+  "searchableAttributes": [
+    "*"
+  ],
+  "filterableAttributes": [
+    "genres",
+    "id"
+  ],
+  "sortableAttributes": [
+    "genres",
+    "id"
+  ],
+  "rankingRules": [
+    "typo",
+    "words",
+    "proximity",
+    "attribute",
+    "exactness",
+    "release_date:asc"
+  ],
+  "stopWords": [],
+  "synonyms": {},
+  "distinctAttribute": null
+}
--- a/index-scheduler/Cargo.toml
+++ b/index-scheduler/Cargo.toml
@@ -22,7 +22,6 @@ log = "0.4.17"
 meilisearch-auth = { path = "../meilisearch-auth" }
 meilisearch-types = { path = "../meilisearch-types" }
 page_size = "0.5.0"
-puffin = "0.16.0"
 roaring = { version = "0.10.1", features = ["serde"] }
 serde = { version = "1.0.160", features = ["derive"] }
 serde_json = { version = "1.0.95", features = ["preserve_order"] }
--- a/index-scheduler/src/batch.rs
+++ b/index-scheduler/src/batch.rs
@@ -471,8 +471,6 @@ impl IndexScheduler {
        #[cfg(test)]
        self.maybe_fail(crate::tests::FailureLocation::InsideCreateBatch)?;

-        puffin::profile_function!();
-
        let enqueued = &self.get_status(rtxn, Status::Enqueued)?;
        let to_cancel = self.get_kind(rtxn, Kind::TaskCancelation)? & enqueued;

@@ -577,9 +575,6 @@ impl IndexScheduler {
            self.maybe_fail(crate::tests::FailureLocation::PanicInsideProcessBatch)?;
            self.breakpoint(crate::Breakpoint::InsideProcessBatch);
        }
-
-        puffin::profile_function!(format!("{:?}", batch));
-
        match batch {
            Batch::TaskCancelation { mut task, previous_started_at, previous_processing_tasks } => {
                // 1. Retrieve the tasks that matched the query at enqueue-time.
@@ -1116,8 +1111,6 @@ impl IndexScheduler {
        index: &'i Index,
        operation: IndexOperation,
    ) -> Result<Vec<Task>> {
-        puffin::profile_function!();
-
        match operation {
            IndexOperation::DocumentClear { mut tasks, .. } => {
                let count = milli::update::ClearDocuments::new(index_wtxn, index).execute()?;
--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
@@ -790,10 +790,19 @@ impl IndexScheduler {

        let mut res = BTreeMap::new();

+        let processing_tasks = { self.processing_tasks.read().unwrap().processing.len() };
+
        res.insert(
            "statuses".to_string(),
            enum_iterator::all::<Status>()
-                .map(|s| Ok((s.to_string(), self.get_status(&rtxn, s)?.len())))
+                .map(|s| {
+                    let tasks = self.get_status(&rtxn, s)?.len();
+                    match s {
+                        Status::Enqueued => Ok((s.to_string(), tasks - processing_tasks)),
+                        Status::Processing => Ok((s.to_string(), processing_tasks)),
+                        s => Ok((s.to_string(), tasks)),
+                    }
+                })
                .collect::<Result<BTreeMap<String, u64>>>()?,
        );
        res.insert(
@@ -1053,8 +1062,6 @@ impl IndexScheduler {
            self.breakpoint(Breakpoint::Start);
        }

-        puffin::GlobalProfiler::lock().new_frame();
-
        self.cleanup_task_queue()?;

        let rtxn = self.env.read_txn().map_err(Error::HeedTransaction)?;
@@ -4131,4 +4138,154 @@ mod tests {
        snapshot!(json_string!(tasks, { "[].enqueuedAt" => "[date]", "[].startedAt" => "[date]", "[].finishedAt" => "[date]", ".**.original_filter" => "[filter]", ".**.query" => "[query]" }), name: "everything_has_been_processed");
        drop(rtxn);
    }
+
+    #[test]
+    fn basic_get_stats() {
+        let (index_scheduler, mut handle) = IndexScheduler::test(true, vec![]);
+
+        let kind = index_creation_task("catto", "mouse");
+        let _task = index_scheduler.register(kind).unwrap();
+        let kind = index_creation_task("doggo", "sheep");
+        let _task = index_scheduler.register(kind).unwrap();
+        let kind = index_creation_task("whalo", "fish");
+        let _task = index_scheduler.register(kind).unwrap();
+
+        snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
+        {
+          "indexes": {
+            "catto": 1,
+            "doggo": 1,
+            "whalo": 1
+          },
+          "statuses": {
+            "canceled": 0,
+            "enqueued": 3,
+            "failed": 0,
+            "processing": 0,
+            "succeeded": 0
+          },
+          "types": {
+            "documentAdditionOrUpdate": 0,
+            "documentDeletion": 0,
+            "dumpCreation": 0,
+            "indexCreation": 3,
+            "indexDeletion": 0,
+            "indexSwap": 0,
+            "indexUpdate": 0,
+            "settingsUpdate": 0,
+            "snapshotCreation": 0,
+            "taskCancelation": 0,
+            "taskDeletion": 0
+          }
+        }
+        "###);
+
+        handle.advance_till([Start, BatchCreated]);
+        snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
+        {
+          "indexes": {
+            "catto": 1,
+            "doggo": 1,
+            "whalo": 1
+          },
+          "statuses": {
+            "canceled": 0,
+            "enqueued": 2,
+            "failed": 0,
+            "processing": 1,
+            "succeeded": 0
+          },
+          "types": {
+            "documentAdditionOrUpdate": 0,
+            "documentDeletion": 0,
+            "dumpCreation": 0,
+            "indexCreation": 3,
+            "indexDeletion": 0,
+            "indexSwap": 0,
+            "indexUpdate": 0,
+            "settingsUpdate": 0,
+            "snapshotCreation": 0,
+            "taskCancelation": 0,
+            "taskDeletion": 0
+          }
+        }
+        "###);
+
+        handle.advance_till([
+            InsideProcessBatch,
+            InsideProcessBatch,
+            ProcessBatchSucceeded,
+            AfterProcessing,
+            Start,
+            BatchCreated,
+        ]);
+        snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
+        {
+          "indexes": {
+            "catto": 1,
+            "doggo": 1,
+            "whalo": 1
+          },
+          "statuses": {
+            "canceled": 0,
+            "enqueued": 1,
+            "failed": 0,
+            "processing": 1,
+            "succeeded": 1
+          },
+          "types": {
+            "documentAdditionOrUpdate": 0,
+            "documentDeletion": 0,
+            "dumpCreation": 0,
+            "indexCreation": 3,
+            "indexDeletion": 0,
+            "indexSwap": 0,
+            "indexUpdate": 0,
+            "settingsUpdate": 0,
+            "snapshotCreation": 0,
+            "taskCancelation": 0,
+            "taskDeletion": 0
+          }
+        }
+        "###);
+
+        // now we make one more batch, the started_at field of the new tasks will be past `second_start_time`
+        handle.advance_till([
+            InsideProcessBatch,
+            InsideProcessBatch,
+            ProcessBatchSucceeded,
+            AfterProcessing,
+            Start,
+            BatchCreated,
+        ]);
+        snapshot!(json_string!(index_scheduler.get_stats().unwrap()), @r###"
+        {
+          "indexes": {
+            "catto": 1,
+            "doggo": 1,
+            "whalo": 1
+          },
+          "statuses": {
+            "canceled": 0,
+            "enqueued": 0,
+            "failed": 0,
+            "processing": 1,
+            "succeeded": 2
+          },
+          "types": {
+            "documentAdditionOrUpdate": 0,
+            "documentDeletion": 0,
+            "dumpCreation": 0,
+            "indexCreation": 3,
+            "indexDeletion": 0,
+            "indexSwap": 0,
+            "indexUpdate": 0,
+            "settingsUpdate": 0,
+            "snapshotCreation": 0,
+            "taskCancelation": 0,
+            "taskDeletion": 0
+          }
+        }
+        "###);
+    }
 }
--- a/meili-snap/src/lib.rs
+++ b/meili-snap/src/lib.rs
@@ -167,7 +167,9 @@ macro_rules! snapshot {
        let (settings, snap_name, _) = $crate::default_snapshot_settings_for_test(test_name, Some(&snap_name));
        settings.bind(|| {
            let snap = format!("{}", $value);
-            meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
+            insta::allow_duplicates! {
+                meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
+            }
        });
    };
    ($value:expr, @$inline:literal) => {
@@ -176,7 +178,9 @@ macro_rules! snapshot {
        let (settings, _, _) = $crate::default_snapshot_settings_for_test("", Some("_dummy_argument"));
        settings.bind(|| {
            let snap = format!("{}", $value);
-            meili_snap::insta::assert_snapshot!(snap, @$inline);
+            insta::allow_duplicates! {
+                meili_snap::insta::assert_snapshot!(snap, @$inline);
+            }
        });
    };
    ($value:expr) => {
@@ -194,7 +198,9 @@ macro_rules! snapshot {
        let (settings, snap_name, _) = $crate::default_snapshot_settings_for_test(test_name, None);
        settings.bind(|| {
            let snap = format!("{}", $value);
-            meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
+            insta::allow_duplicates! {
+                meili_snap::insta::assert_snapshot!(format!("{}", snap_name), snap);
+            }
        });
    };
 }
--- a/meilisearch-types/src/error.rs
+++ b/meilisearch-types/src/error.rs
@@ -259,9 +259,6 @@ InvalidSettingsRankingRules           , InvalidRequest       , BAD_REQUEST ;
 InvalidSettingsSearchableAttributes   , InvalidRequest       , BAD_REQUEST ;
 InvalidSettingsSortableAttributes     , InvalidRequest       , BAD_REQUEST ;
 InvalidSettingsStopWords              , InvalidRequest       , BAD_REQUEST ;
-InvalidSettingsNonSeparatorTokens     , InvalidRequest       , BAD_REQUEST ;
-InvalidSettingsSeparatorTokens        , InvalidRequest       , BAD_REQUEST ;
-InvalidSettingsDictionary             , InvalidRequest       , BAD_REQUEST ;
 InvalidSettingsSynonyms               , InvalidRequest       , BAD_REQUEST ;
 InvalidSettingsTypoTolerance          , InvalidRequest       , BAD_REQUEST ;
 InvalidState                          , Internal             , INTERNAL_SERVER_ERROR ;
--- a/meilisearch-types/src/settings.rs
+++ b/meilisearch-types/src/settings.rs
@@ -171,15 +171,6 @@ pub struct Settings<T> {
    #[deserr(default, error = DeserrJsonError<InvalidSettingsStopWords>)]
    pub stop_words: Setting<BTreeSet<String>>,
    #[serde(default, skip_serializing_if = "Setting::is_not_set")]
-    #[deserr(default, error = DeserrJsonError<InvalidSettingsNonSeparatorTokens>)]
-    pub non_separator_tokens: Setting<BTreeSet<String>>,
-    #[serde(default, skip_serializing_if = "Setting::is_not_set")]
-    #[deserr(default, error = DeserrJsonError<InvalidSettingsSeparatorTokens>)]
-    pub separator_tokens: Setting<BTreeSet<String>>,
-    #[serde(default, skip_serializing_if = "Setting::is_not_set")]
-    #[deserr(default, error = DeserrJsonError<InvalidSettingsDictionary>)]
-    pub dictionary: Setting<BTreeSet<String>>,
-    #[serde(default, skip_serializing_if = "Setting::is_not_set")]
    #[deserr(default, error = DeserrJsonError<InvalidSettingsSynonyms>)]
    pub synonyms: Setting<BTreeMap<String, Vec<String>>>,
    #[serde(default, skip_serializing_if = "Setting::is_not_set")]
@@ -210,9 +201,6 @@ impl Settings<Checked> {
            ranking_rules: Setting::Reset,
            stop_words: Setting::Reset,
            synonyms: Setting::Reset,
-            non_separator_tokens: Setting::Reset,
-            separator_tokens: Setting::Reset,
-            dictionary: Setting::Reset,
            distinct_attribute: Setting::Reset,
            typo_tolerance: Setting::Reset,
            faceting: Setting::Reset,
@@ -229,9 +217,6 @@ impl Settings<Checked> {
            sortable_attributes,
            ranking_rules,
            stop_words,
-            non_separator_tokens,
-            separator_tokens,
-            dictionary,
            synonyms,
            distinct_attribute,
            typo_tolerance,
@@ -247,9 +232,6 @@ impl Settings<Checked> {
            sortable_attributes,
            ranking_rules,
            stop_words,
-            non_separator_tokens,
-            separator_tokens,
-            dictionary,
            synonyms,
            distinct_attribute,
            typo_tolerance,
@@ -292,9 +274,6 @@ impl Settings<Unchecked> {
            ranking_rules: self.ranking_rules,
            stop_words: self.stop_words,
            synonyms: self.synonyms,
-            non_separator_tokens: self.non_separator_tokens,
-            separator_tokens: self.separator_tokens,
-            dictionary: self.dictionary,
            distinct_attribute: self.distinct_attribute,
            typo_tolerance: self.typo_tolerance,
            faceting: self.faceting,
@@ -356,28 +335,6 @@ pub fn apply_settings_to_builder(
        Setting::NotSet => (),
    }

-    match settings.non_separator_tokens {
-        Setting::Set(ref non_separator_tokens) => {
-            builder.set_non_separator_tokens(non_separator_tokens.clone())
-        }
-        Setting::Reset => builder.reset_non_separator_tokens(),
-        Setting::NotSet => (),
-    }
-
-    match settings.separator_tokens {
-        Setting::Set(ref separator_tokens) => {
-            builder.set_separator_tokens(separator_tokens.clone())
-        }
-        Setting::Reset => builder.reset_separator_tokens(),
-        Setting::NotSet => (),
-    }
-
-    match settings.dictionary {
-        Setting::Set(ref dictionary) => builder.set_dictionary(dictionary.clone()),
-        Setting::Reset => builder.reset_dictionary(),
-        Setting::NotSet => (),
-    }
-
    match settings.synonyms {
        Setting::Set(ref synonyms) => builder.set_synonyms(synonyms.clone().into_iter().collect()),
        Setting::Reset => builder.reset_synonyms(),
@@ -502,14 +459,15 @@ pub fn settings(
        })
        .transpose()?
        .unwrap_or_default();
-
-    let non_separator_tokens = index.non_separator_tokens(rtxn)?.unwrap_or_default();
-    let separator_tokens = index.separator_tokens(rtxn)?.unwrap_or_default();
-    let dictionary = index.dictionary(rtxn)?.unwrap_or_default();
-
    let distinct_field = index.distinct_field(rtxn)?.map(String::from);

-    let synonyms = index.user_defined_synonyms(rtxn)?;
+    // in milli each word in the synonyms map were split on their separator. Since we lost
+    // this information we are going to put space between words.
+    let synonyms = index
+        .synonyms(rtxn)?
+        .iter()
+        .map(|(key, values)| (key.join(" "), values.iter().map(|value| value.join(" ")).collect()))
+        .collect();

    let min_typo_word_len = MinWordSizeTyposSetting {
        one_typo: Setting::Set(index.min_word_len_one_typo(rtxn)?),
@@ -562,9 +520,6 @@ pub fn settings(
        sortable_attributes: Setting::Set(sortable_attributes),
        ranking_rules: Setting::Set(criteria.iter().map(|c| c.clone().into()).collect()),
        stop_words: Setting::Set(stop_words),
-        non_separator_tokens: Setting::Set(non_separator_tokens),
-        separator_tokens: Setting::Set(separator_tokens),
-        dictionary: Setting::Set(dictionary),
        distinct_attribute: match distinct_field {
            Some(field) => Setting::Set(field),
            None => Setting::Reset,
@@ -687,9 +642,6 @@ pub(crate) mod test {
            sortable_attributes: Setting::NotSet,
            ranking_rules: Setting::NotSet,
            stop_words: Setting::NotSet,
-            non_separator_tokens: Setting::NotSet,
-            separator_tokens: Setting::NotSet,
-            dictionary: Setting::NotSet,
            synonyms: Setting::NotSet,
            distinct_attribute: Setting::NotSet,
            typo_tolerance: Setting::NotSet,
@@ -711,9 +663,6 @@ pub(crate) mod test {
            sortable_attributes: Setting::NotSet,
            ranking_rules: Setting::NotSet,
            stop_words: Setting::NotSet,
-            non_separator_tokens: Setting::NotSet,
-            separator_tokens: Setting::NotSet,
-            dictionary: Setting::NotSet,
            synonyms: Setting::NotSet,
            distinct_attribute: Setting::NotSet,
            typo_tolerance: Setting::NotSet,
--- a/meilisearch/Cargo.toml
+++ b/meilisearch/Cargo.toml
@@ -58,7 +58,7 @@ lazy_static = "1.4.0"
 log = "0.4.17"
 meilisearch-auth = { path = "../meilisearch-auth" }
 meilisearch-types = { path = "../meilisearch-types" }
-mimalloc = { version = "0.1.37", default-features = false }
+mimalloc = { version = "0.1.36", default-features = false }
 mime = "0.3.17"
 num_cpus = "1.15.0"
 obkv = "0.2.0"
@@ -69,8 +69,6 @@ permissive-json-pointer = { path = "../permissive-json-pointer" }
 pin-project-lite = "0.2.9"
 platform-dirs = "0.3.0"
 prometheus = { version = "0.13.3", features = ["process"] }
-puffin = "0.16.0"
-puffin_http = { version = "0.13.0", optional = true }
 rand = "0.8.5"
 rayon = "1.7.0"
 regex = "1.7.3"
@@ -135,18 +133,7 @@ zip = { version = "0.6.4", optional = true }
 [features]
 default = ["analytics", "meilisearch-types/all-tokenizations", "mini-dashboard"]
 analytics = ["segment"]
-profile-with-puffin = ["dep:puffin_http"]
-mini-dashboard = [
-    "actix-web-static-files",
-    "static-files",
-    "anyhow",
-    "cargo_toml",
-    "hex",
-    "reqwest",
-    "sha-1",
-    "tempfile",
-    "zip",
-]
+mini-dashboard = ["actix-web-static-files", "static-files", "anyhow", "cargo_toml", "hex", "reqwest", "sha-1", "tempfile", "zip"]
 chinese = ["meilisearch-types/chinese"]
 hebrew = ["meilisearch-types/hebrew"]
 japanese = ["meilisearch-types/japanese"]
--- a/meilisearch/src/main.rs
+++ b/meilisearch/src/main.rs
@@ -30,10 +30,6 @@ fn setup(opt: &Opt) -> anyhow::Result<()> {
 async fn main() -> anyhow::Result<()> {
    let (opt, config_read_from) = Opt::try_build()?;

-    #[cfg(feature = "profile-with-puffin")]
-    let _server = puffin_http::Server::new(&format!("0.0.0.0:{}", puffin_http::DEFAULT_PORT))?;
-    puffin::set_scopes_on(cfg!(feature = "profile-with-puffin"));
-
    anyhow::ensure!(
        !(cfg!(windows) && opt.experimental_reduce_indexing_memory_usage),
        "The `experimental-reduce-indexing-memory-usage` flag is not supported on Windows"
--- a/meilisearch/src/routes/indexes/settings.rs
+++ b/meilisearch/src/routes/indexes/settings.rs
@@ -310,81 +310,6 @@ make_setting_route!(
    }
 );

-make_setting_route!(
-    "/non-separator-tokens",
-    put,
-    std::collections::BTreeSet<String>,
-    meilisearch_types::deserr::DeserrJsonError<
-        meilisearch_types::error::deserr_codes::InvalidSettingsNonSeparatorTokens,
-    >,
-    non_separator_tokens,
-    "nonSeparatorTokens",
-    analytics,
-    |non_separator_tokens: &Option<std::collections::BTreeSet<String>>, req: &HttpRequest| {
-        use serde_json::json;
-
-        analytics.publish(
-            "nonSeparatorTokens Updated".to_string(),
-            json!({
-                "non_separator_tokens": {
-                    "total": non_separator_tokens.as_ref().map(|non_separator_tokens| non_separator_tokens.len()),
-                },
-            }),
-            Some(req),
-        );
-    }
-);
-
-make_setting_route!(
-    "/separator-tokens",
-    put,
-    std::collections::BTreeSet<String>,
-    meilisearch_types::deserr::DeserrJsonError<
-        meilisearch_types::error::deserr_codes::InvalidSettingsSeparatorTokens,
-    >,
-    separator_tokens,
-    "separatorTokens",
-    analytics,
-    |separator_tokens: &Option<std::collections::BTreeSet<String>>, req: &HttpRequest| {
-        use serde_json::json;
-
-        analytics.publish(
-            "separatorTokens Updated".to_string(),
-            json!({
-                "separator_tokens": {
-                    "total": separator_tokens.as_ref().map(|separator_tokens| separator_tokens.len()),
-                },
-            }),
-            Some(req),
-        );
-    }
-);
-
-make_setting_route!(
-    "/dictionary",
-    put,
-    std::collections::BTreeSet<String>,
-    meilisearch_types::deserr::DeserrJsonError<
-        meilisearch_types::error::deserr_codes::InvalidSettingsDictionary,
-    >,
-    dictionary,
-    "dictionary",
-    analytics,
-    |dictionary: &Option<std::collections::BTreeSet<String>>, req: &HttpRequest| {
-        use serde_json::json;
-
-        analytics.publish(
-            "dictionary Updated".to_string(),
-            json!({
-                "dictionary": {
-                    "total": dictionary.as_ref().map(|dictionary| dictionary.len()),
-                },
-            }),
-            Some(req),
-        );
-    }
-);
-
 make_setting_route!(
    "/synonyms",
    put,
@@ -541,9 +466,6 @@ generate_configure!(
    searchable_attributes,
    distinct_attribute,
    stop_words,
-    separator_tokens,
-    non_separator_tokens,
-    dictionary,
    synonyms,
    ranking_rules,
    typo_tolerance,
--- a/meilisearch/src/search.rs
+++ b/meilisearch/src/search.rs
@@ -491,20 +491,6 @@ pub fn perform_search(
        tokenizer_builder.allow_list(&script_lang_map);
    }

-    let separators = index.allowed_separators(&rtxn)?;
-    let separators: Option<Vec<_>> =
-        separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
-    if let Some(ref separators) = separators {
-        tokenizer_builder.separators(separators);
-    }
-
-    let dictionary = index.dictionary(&rtxn)?;
-    let dictionary: Option<Vec<_>> =
-        dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
-    if let Some(ref dictionary) = dictionary {
-        tokenizer_builder.words_dict(dictionary);
-    }
-
    let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer_builder.build());
    formatter_builder.crop_marker(query.crop_marker);
    formatter_builder.highlight_prefix(query.highlight_pre_tag);
--- a/meilisearch/tests/dumps/mod.rs
+++ b/meilisearch/tests/dumps/mod.rs
--- a/meilisearch/tests/search/geo.rs
+++ b/meilisearch/tests/search/geo.rs
@@ -1,3 +1,4 @@
+use meili_snap::{json_string, snapshot};
 use once_cell::sync::Lazy;
 use serde_json::{json, Value};

@@ -60,3 +61,59 @@ async fn geo_sort_with_geo_strings() {
        )
        .await;
 }
+
+#[actix_rt::test]
+async fn geo_bounding_box_with_string_and_number() {
+    let server = Server::new().await;
+    let index = server.index("test");
+
+    let documents = DOCUMENTS.clone();
+    index.update_settings_filterable_attributes(json!(["_geo"])).await;
+    index.update_settings_sortable_attributes(json!(["_geo"])).await;
+    index.add_documents(documents, None).await;
+    index.wait_task(2).await;
+
+    index
+        .search(
+            json!({
+                "filter": "_geoBoundingBox([89, 179], [-89, -179])",
+            }),
+            |response, code| {
+                assert_eq!(code, 200, "{}", response);
+                snapshot!(json_string!(response, { ".processingTimeMs" => "[time]" }), @r###"
+                {
+                  "hits": [
+                    {
+                      "id": 1,
+                      "name": "Taco Truck",
+                      "address": "444 Salsa Street, Burritoville",
+                      "type": "Mexican",
+                      "rating": 9,
+                      "_geo": {
+                        "lat": 34.0522,
+                        "lng": -118.2437
+                      }
+                    },
+                    {
+                      "id": 2,
+                      "name": "La Bella Italia",
+                      "address": "456 Elm Street, Townsville",
+                      "type": "Italian",
+                      "rating": 9,
+                      "_geo": {
+                        "lat": "45.4777599",
+                        "lng": "9.1967508"
+                      }
+                    }
+                  ],
+                  "query": "",
+                  "processingTimeMs": "[time]",
+                  "limit": 20,
+                  "offset": 0,
+                  "estimatedTotalHits": 2
+                }
+                "###);
+            },
+        )
+        .await;
+}
--- a/meilisearch/tests/settings/get_settings.rs
+++ b/meilisearch/tests/settings/get_settings.rs
@@ -16,9 +16,6 @@ static DEFAULT_SETTINGS_VALUES: Lazy<HashMap<&'static str, Value>> = Lazy::new(|
        json!(["words", "typo", "proximity", "attribute", "sort", "exactness"]),
    );
    map.insert("stop_words", json!([]));
-    map.insert("non_separator_tokens", json!([]));
-    map.insert("separator_tokens", json!([]));
-    map.insert("dictionary", json!([]));
    map.insert("synonyms", json!({}));
    map.insert(
        "faceting",
@@ -54,7 +51,7 @@ async fn get_settings() {
    let (response, code) = index.settings().await;
    assert_eq!(code, 200);
    let settings = response.as_object().unwrap();
-    assert_eq!(settings.keys().len(), 14);
+    assert_eq!(settings.keys().len(), 11);
    assert_eq!(settings["displayedAttributes"], json!(["*"]));
    assert_eq!(settings["searchableAttributes"], json!(["*"]));
    assert_eq!(settings["filterableAttributes"], json!([]));
@@ -65,9 +62,6 @@ async fn get_settings() {
        json!(["words", "typo", "proximity", "attribute", "sort", "exactness"])
    );
    assert_eq!(settings["stopWords"], json!([]));
-    assert_eq!(settings["nonSeparatorTokens"], json!([]));
-    assert_eq!(settings["separatorTokens"], json!([]));
-    assert_eq!(settings["dictionary"], json!([]));
    assert_eq!(
        settings["faceting"],
        json!({
@@ -278,9 +272,6 @@ test_setting_routes!(
    searchable_attributes put,
    distinct_attribute put,
    stop_words put,
-    separator_tokens put,
-    non_separator_tokens put,
-    dictionary put,
    ranking_rules put,
    synonyms put,
    pagination patch,
--- a/meilisearch/tests/settings/mod.rs
+++ b/meilisearch/tests/settings/mod.rs
@@ -1,4 +1,3 @@
 mod distinct;
 mod errors;
 mod get_settings;
-mod tokenizer_customization;
--- a/meilisearch/tests/settings/tokenizer_customization.rs
+++ b/meilisearch/tests/settings/tokenizer_customization.rs
@@ -1,467 +0,0 @@
-use meili_snap::{json_string, snapshot};
-use serde_json::json;
-
-use crate::common::Server;
-
-#[actix_rt::test]
-async fn set_and_reset() {
-    let server = Server::new().await;
-    let index = server.index("test");
-
-    let (_response, _code) = index
-        .update_settings(json!({
-            "nonSeparatorTokens": ["#", "&"],
-            "separatorTokens": ["&sep", "<br/>"],
-            "dictionary": ["J.R.R.", "J. R. R."],
-        }))
-        .await;
-    index.wait_task(0).await;
-
-    let (response, _) = index.settings().await;
-    snapshot!(json_string!(response["nonSeparatorTokens"]), @r###"
-    [
-      "#",
-      "&"
-    ]
-    "###);
-    snapshot!(json_string!(response["separatorTokens"]), @r###"
-    [
-      "&sep",
-      "<br/>"
-    ]
-    "###);
-    snapshot!(json_string!(response["dictionary"]), @r###"
-    [
-      "J. R. R.",
-      "J.R.R."
-    ]
-    "###);
-
-    index
-        .update_settings(json!({
-            "nonSeparatorTokens": null,
-            "separatorTokens": null,
-            "dictionary": null,
-        }))
-        .await;
-
-    index.wait_task(1).await;
-
-    let (response, _) = index.settings().await;
-    snapshot!(json_string!(response["nonSeparatorTokens"]), @"[]");
-    snapshot!(json_string!(response["separatorTokens"]), @"[]");
-    snapshot!(json_string!(response["dictionary"]), @"[]");
-}
-
-#[actix_rt::test]
-async fn set_and_search() {
-    let documents = json!([
-        {
-            "id": 1,
-            "content": "Mac & cheese",
-        },
-        {
-            "id": 2,
-            "content": "G#D#G#D#G#C#D#G#C#",
-        },
-        {
-            "id": 3,
-            "content": "Mac&sep&&sepcheese",
-        },
-    ]);
-
-    let server = Server::new().await;
-    let index = server.index("test");
-
-    index.add_documents(documents, None).await;
-    index.wait_task(0).await;
-
-    let (_response, _code) = index
-        .update_settings(json!({
-            "nonSeparatorTokens": ["#", "&"],
-            "separatorTokens": ["<br/>", "&sep"],
-            "dictionary": ["#", "A#", "B#", "C#", "D#", "E#", "F#", "G#"],
-        }))
-        .await;
-    index.wait_task(1).await;
-
-    index
-        .search(json!({"q": "&", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 1,
-                "content": "Mac & cheese",
-                "_formatted": {
-                  "id": "1",
-                  "content": "Mac <em>&</em> cheese"
-                }
-              },
-              {
-                "id": 3,
-                "content": "Mac&sep&&sepcheese",
-                "_formatted": {
-                  "id": "3",
-                  "content": "Mac&sep<em>&</em>&sepcheese"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(
-            json!({"q": "Mac & cheese", "attributesToHighlight": ["content"]}),
-            |response, code| {
-                snapshot!(code, @"200 OK");
-                snapshot!(json_string!(response["hits"]), @r###"
-                [
-                  {
-                    "id": 1,
-                    "content": "Mac & cheese",
-                    "_formatted": {
-                      "id": "1",
-                      "content": "<em>Mac</em> <em>&</em> <em>cheese</em>"
-                    }
-                  },
-                  {
-                    "id": 3,
-                    "content": "Mac&sep&&sepcheese",
-                    "_formatted": {
-                      "id": "3",
-                      "content": "<em>Mac</em>&sep<em>&</em>&sep<em>cheese</em>"
-                    }
-                  }
-                ]
-                "###);
-            },
-        )
-        .await;
-
-    index
-        .search(
-            json!({"q": "Mac&sep&&sepcheese", "attributesToHighlight": ["content"]}),
-            |response, code| {
-                snapshot!(code, @"200 OK");
-                snapshot!(json_string!(response["hits"]), @r###"
-                [
-                  {
-                    "id": 1,
-                    "content": "Mac & cheese",
-                    "_formatted": {
-                      "id": "1",
-                      "content": "<em>Mac</em> <em>&</em> <em>cheese</em>"
-                    }
-                  },
-                  {
-                    "id": 3,
-                    "content": "Mac&sep&&sepcheese",
-                    "_formatted": {
-                      "id": "3",
-                      "content": "<em>Mac</em>&sep<em>&</em>&sep<em>cheese</em>"
-                    }
-                  }
-                ]
-                "###);
-            },
-        )
-        .await;
-
-    index
-        .search(json!({"q": "C#D#G", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 2,
-                "content": "G#D#G#D#G#C#D#G#C#",
-                "_formatted": {
-                  "id": "2",
-                  "content": "<em>G</em>#<em>D#</em><em>G</em>#<em>D#</em><em>G</em>#<em>C#</em><em>D#</em><em>G</em>#<em>C#</em>"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(json!({"q": "#", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @"[]");
-        })
-        .await;
-}
-
-#[actix_rt::test]
-async fn advanced_synergies() {
-    let documents = json!([
-        {
-            "id": 1,
-            "content": "J.R.R. Tolkien",
-        },
-        {
-            "id": 2,
-            "content": "J. R. R. Tolkien",
-        },
-        {
-            "id": 3,
-            "content": "jrr Tolkien",
-        },
-        {
-            "id": 4,
-            "content": "J.K. Rowlings",
-        },
-        {
-            "id": 5,
-            "content": "J. K. Rowlings",
-        },
-        {
-            "id": 6,
-            "content": "jk Rowlings",
-        },
-    ]);
-
-    let server = Server::new().await;
-    let index = server.index("test");
-
-    index.add_documents(documents, None).await;
-    index.wait_task(0).await;
-
-    let (_response, _code) = index
-        .update_settings(json!({
-            "dictionary": ["J.R.R.", "J. R. R."],
-            "synonyms": {
-                "J.R.R.": ["jrr", "J. R. R."],
-                "J. R. R.": ["jrr", "J.R.R."],
-                "jrr": ["J.R.R.", "J. R. R."],
-                "J.K.": ["jk", "J. K."],
-                "J. K.": ["jk", "J.K."],
-                "jk": ["J.K.", "J. K."],
-            }
-        }))
-        .await;
-    index.wait_task(1).await;
-
-    index
-        .search(json!({"q": "J.R.R.", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 1,
-                "content": "J.R.R. Tolkien",
-                "_formatted": {
-                  "id": "1",
-                  "content": "<em>J.R.R.</em> Tolkien"
-                }
-              },
-              {
-                "id": 2,
-                "content": "J. R. R. Tolkien",
-                "_formatted": {
-                  "id": "2",
-                  "content": "<em>J. R. R.</em> Tolkien"
-                }
-              },
-              {
-                "id": 3,
-                "content": "jrr Tolkien",
-                "_formatted": {
-                  "id": "3",
-                  "content": "<em>jrr</em> Tolkien"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(json!({"q": "jrr", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 3,
-                "content": "jrr Tolkien",
-                "_formatted": {
-                  "id": "3",
-                  "content": "<em>jrr</em> Tolkien"
-                }
-              },
-              {
-                "id": 1,
-                "content": "J.R.R. Tolkien",
-                "_formatted": {
-                  "id": "1",
-                  "content": "<em>J.R.R.</em> Tolkien"
-                }
-              },
-              {
-                "id": 2,
-                "content": "J. R. R. Tolkien",
-                "_formatted": {
-                  "id": "2",
-                  "content": "<em>J. R. R.</em> Tolkien"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(json!({"q": "J. R. R.", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 2,
-                "content": "J. R. R. Tolkien",
-                "_formatted": {
-                  "id": "2",
-                  "content": "<em>J. R. R.</em> Tolkien"
-                }
-              },
-              {
-                "id": 1,
-                "content": "J.R.R. Tolkien",
-                "_formatted": {
-                  "id": "1",
-                  "content": "<em>J.R.R.</em> Tolkien"
-                }
-              },
-              {
-                "id": 3,
-                "content": "jrr Tolkien",
-                "_formatted": {
-                  "id": "3",
-                  "content": "<em>jrr</em> Tolkien"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    // Only update dictionary, the synonyms should be recomputed.
-    let (_response, _code) = index
-        .update_settings(json!({
-            "dictionary": ["J.R.R.", "J. R. R.", "J.K.", "J. K."],
-        }))
-        .await;
-    index.wait_task(2).await;
-
-    index
-        .search(json!({"q": "jk", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 6,
-                "content": "jk Rowlings",
-                "_formatted": {
-                  "id": "6",
-                  "content": "<em>jk</em> Rowlings"
-                }
-              },
-              {
-                "id": 4,
-                "content": "J.K. Rowlings",
-                "_formatted": {
-                  "id": "4",
-                  "content": "<em>J.K.</em> Rowlings"
-                }
-              },
-              {
-                "id": 5,
-                "content": "J. K. Rowlings",
-                "_formatted": {
-                  "id": "5",
-                  "content": "<em>J. K.</em> Rowlings"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(json!({"q": "J.K.", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 4,
-                "content": "J.K. Rowlings",
-                "_formatted": {
-                  "id": "4",
-                  "content": "<em>J.K.</em> Rowlings"
-                }
-              },
-              {
-                "id": 5,
-                "content": "J. K. Rowlings",
-                "_formatted": {
-                  "id": "5",
-                  "content": "<em>J. K.</em> Rowlings"
-                }
-              },
-              {
-                "id": 6,
-                "content": "jk Rowlings",
-                "_formatted": {
-                  "id": "6",
-                  "content": "<em>jk</em> Rowlings"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-
-    index
-        .search(json!({"q": "J. K.", "attributesToHighlight": ["content"]}), |response, code| {
-            snapshot!(code, @"200 OK");
-            snapshot!(json_string!(response["hits"]), @r###"
-            [
-              {
-                "id": 5,
-                "content": "J. K. Rowlings",
-                "_formatted": {
-                  "id": "5",
-                  "content": "<em>J. K.</em> Rowlings"
-                }
-              },
-              {
-                "id": 4,
-                "content": "J.K. Rowlings",
-                "_formatted": {
-                  "id": "4",
-                  "content": "<em>J.K.</em> Rowlings"
-                }
-              },
-              {
-                "id": 6,
-                "content": "jk Rowlings",
-                "_formatted": {
-                  "id": "6",
-                  "content": "<em>jk</em> Rowlings"
-                }
-              },
-              {
-                "id": 2,
-                "content": "J. R. R. Tolkien",
-                "_formatted": {
-                  "id": "2",
-                  "content": "<em>J. R.</em> R. Tolkien"
-                }
-              }
-            ]
-            "###);
-        })
-        .await;
-}
--- a/milli/Cargo.toml
+++ b/milli/Cargo.toml
@@ -65,16 +65,13 @@ filter-parser = { path = "../filter-parser" }
 # documents words self-join
 itertools = "0.10.5"

-# profiling
-puffin = "0.16.0"
-
 # logging
 log = "0.4.17"
 logging_timer = "1.1.0"
 csv = "1.2.1"

 [dev-dependencies]
-mimalloc = { version = "0.1.37", default-features = false }
+mimalloc = { version = "0.1.29", default-features = false }
 big_s = "1.0.2"
 insta = "1.29.0"
 maplit = "1.0.2"
--- a/milli/src/index.rs
+++ b/milli/src/index.rs
@@ -1,5 +1,5 @@
 use std::borrow::Cow;
-use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
+use std::collections::{BTreeSet, HashMap, HashSet};
 use std::fs::File;
 use std::mem::size_of;
 use std::path::Path;
@@ -61,12 +61,8 @@ pub mod main_key {
    pub const USER_DEFINED_SEARCHABLE_FIELDS_KEY: &str = "user-defined-searchable-fields";
    pub const SOFT_EXTERNAL_DOCUMENTS_IDS_KEY: &str = "soft-external-documents-ids";
    pub const STOP_WORDS_KEY: &str = "stop-words";
-    pub const NON_SEPARATOR_TOKENS_KEY: &str = "non-separator-tokens";
-    pub const SEPARATOR_TOKENS_KEY: &str = "separator-tokens";
-    pub const DICTIONARY_KEY: &str = "dictionary";
    pub const STRING_FACETED_DOCUMENTS_IDS_PREFIX: &str = "string-faceted-documents-ids";
    pub const SYNONYMS_KEY: &str = "synonyms";
-    pub const USER_DEFINED_SYNONYMS_KEY: &str = "user-defined-synonyms";
    pub const WORDS_FST_KEY: &str = "words-fst";
    pub const WORDS_PREFIXES_FST_KEY: &str = "words-prefixes-fst";
    pub const CREATED_AT_KEY: &str = "created-at";
@@ -1059,116 +1055,18 @@ impl Index {
        }
    }

-    /* non separator tokens */
-
-    pub(crate) fn put_non_separator_tokens(
-        &self,
-        wtxn: &mut RwTxn,
-        set: &BTreeSet<String>,
-    ) -> heed::Result<()> {
-        self.main.put::<_, Str, SerdeBincode<_>>(wtxn, main_key::NON_SEPARATOR_TOKENS_KEY, set)
-    }
-
-    pub(crate) fn delete_non_separator_tokens(&self, wtxn: &mut RwTxn) -> heed::Result<bool> {
-        self.main.delete::<_, Str>(wtxn, main_key::NON_SEPARATOR_TOKENS_KEY)
-    }
-
-    pub fn non_separator_tokens(&self, rtxn: &RoTxn) -> Result<Option<BTreeSet<String>>> {
-        Ok(self.main.get::<_, Str, SerdeBincode<BTreeSet<String>>>(
-            rtxn,
-            main_key::NON_SEPARATOR_TOKENS_KEY,
-        )?)
-    }
-
-    /* separator tokens */
-
-    pub(crate) fn put_separator_tokens(
-        &self,
-        wtxn: &mut RwTxn,
-        set: &BTreeSet<String>,
-    ) -> heed::Result<()> {
-        self.main.put::<_, Str, SerdeBincode<_>>(wtxn, main_key::SEPARATOR_TOKENS_KEY, set)
-    }
-
-    pub(crate) fn delete_separator_tokens(&self, wtxn: &mut RwTxn) -> heed::Result<bool> {
-        self.main.delete::<_, Str>(wtxn, main_key::SEPARATOR_TOKENS_KEY)
-    }
-
-    pub fn separator_tokens(&self, rtxn: &RoTxn) -> Result<Option<BTreeSet<String>>> {
-        Ok(self
-            .main
-            .get::<_, Str, SerdeBincode<BTreeSet<String>>>(rtxn, main_key::SEPARATOR_TOKENS_KEY)?)
-    }
-
-    /* separators easing method */
-
-    pub fn allowed_separators(&self, rtxn: &RoTxn) -> Result<Option<BTreeSet<String>>> {
-        let default_separators =
-            charabia::separators::DEFAULT_SEPARATORS.iter().map(|s| s.to_string());
-        let mut separators: Option<BTreeSet<_>> = None;
-        if let Some(mut separator_tokens) = self.separator_tokens(rtxn)? {
-            separator_tokens.extend(default_separators.clone());
-            separators = Some(separator_tokens);
-        }
-
-        if let Some(non_separator_tokens) = self.non_separator_tokens(rtxn)? {
-            separators = separators
-                .or_else(|| Some(default_separators.collect()))
-                .map(|separators| &separators - &non_separator_tokens);
-        }
-
-        Ok(separators)
-    }
-
-    /* dictionary */
-
-    pub(crate) fn put_dictionary(
-        &self,
-        wtxn: &mut RwTxn,
-        set: &BTreeSet<String>,
-    ) -> heed::Result<()> {
-        self.main.put::<_, Str, SerdeBincode<_>>(wtxn, main_key::DICTIONARY_KEY, set)
-    }
-
-    pub(crate) fn delete_dictionary(&self, wtxn: &mut RwTxn) -> heed::Result<bool> {
-        self.main.delete::<_, Str>(wtxn, main_key::DICTIONARY_KEY)
-    }
-
-    pub fn dictionary(&self, rtxn: &RoTxn) -> Result<Option<BTreeSet<String>>> {
-        Ok(self
-            .main
-            .get::<_, Str, SerdeBincode<BTreeSet<String>>>(rtxn, main_key::DICTIONARY_KEY)?)
-    }
-
    /* synonyms */

    pub(crate) fn put_synonyms(
        &self,
        wtxn: &mut RwTxn,
        synonyms: &HashMap<Vec<String>, Vec<Vec<String>>>,
-        user_defined_synonyms: &BTreeMap<String, Vec<String>>,
    ) -> heed::Result<()> {
-        self.main.put::<_, Str, SerdeBincode<_>>(wtxn, main_key::SYNONYMS_KEY, synonyms)?;
-        self.main.put::<_, Str, SerdeBincode<_>>(
-            wtxn,
-            main_key::USER_DEFINED_SYNONYMS_KEY,
-            user_defined_synonyms,
-        )
+        self.main.put::<_, Str, SerdeBincode<_>>(wtxn, main_key::SYNONYMS_KEY, synonyms)
    }

    pub(crate) fn delete_synonyms(&self, wtxn: &mut RwTxn) -> heed::Result<bool> {
-        self.main.delete::<_, Str>(wtxn, main_key::SYNONYMS_KEY)?;
-        self.main.delete::<_, Str>(wtxn, main_key::USER_DEFINED_SYNONYMS_KEY)
-    }
-
-    pub fn user_defined_synonyms(
-        &self,
-        rtxn: &RoTxn,
-    ) -> heed::Result<BTreeMap<String, Vec<String>>> {
-        Ok(self
-            .main
-            .get::<_, Str, SerdeBincode<_>>(rtxn, main_key::USER_DEFINED_SYNONYMS_KEY)?
-            .unwrap_or_default())
+        self.main.delete::<_, Str>(wtxn, main_key::SYNONYMS_KEY)
    }

    pub fn synonyms(&self, rtxn: &RoTxn) -> heed::Result<HashMap<Vec<String>, Vec<Vec<String>>>> {
@@ -1820,11 +1718,11 @@ pub(crate) mod tests {
            .unwrap();
        index
            .add_documents(documents!([
-                { "id": 0, "_geo": { "lat": 0, "lng": 0 } },
-                { "id": 1, "_geo": { "lat": 0, "lng": -175 } },
-                { "id": 2, "_geo": { "lat": 0, "lng": 175 } },
+                { "id": 0, "_geo": { "lat": "0", "lng": "0" } },
+                { "id": 1, "_geo": { "lat": 0, "lng": "-175" } },
+                { "id": 2, "_geo": { "lat": "0", "lng": 175 } },
                { "id": 3, "_geo": { "lat": 85, "lng": 0 } },
-                { "id": 4, "_geo": { "lat": -85, "lng": 0 } },
+                { "id": 4, "_geo": { "lat": "-85", "lng": "0" } },
            ]))
            .unwrap();

--- a/milli/src/lib.rs
+++ b/milli/src/lib.rs
@@ -97,7 +97,7 @@ const MAX_LMDB_KEY_LENGTH: usize = 500;
 ///
 /// This number is determined by the keys of the different facet databases
 /// and adding a margin of safety.
-pub const MAX_FACET_VALUE_LENGTH: usize = MAX_LMDB_KEY_LENGTH - 20;
+pub const MAX_FACET_VALUE_LENGTH: usize = MAX_LMDB_KEY_LENGTH - 32;

 /// The maximum length a word can be
 pub const MAX_WORD_LENGTH: usize = MAX_LMDB_KEY_LENGTH / 2;
--- a/milli/src/search/new/mod.rs
+++ b/milli/src/search/new/mod.rs
@@ -488,20 +488,6 @@ pub fn execute_search(
            tokbuilder.stop_words(stop_words);
        }

-        let separators = ctx.index.allowed_separators(ctx.txn)?;
-        let separators: Option<Vec<_>> =
-            separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
-        if let Some(ref separators) = separators {
-            tokbuilder.separators(separators);
-        }
-
-        let dictionary = ctx.index.dictionary(ctx.txn)?;
-        let dictionary: Option<Vec<_>> =
-            dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
-        if let Some(ref dictionary) = dictionary {
-            tokbuilder.words_dict(dictionary);
-        }
-
        let script_lang_map = ctx.index.script_language(ctx.txn)?;
        if !script_lang_map.is_empty() {
            tokbuilder.allow_list(&script_lang_map);
--- a/milli/src/search/new/tests/integration.rs
+++ b/milli/src/search/new/tests/integration.rs
@@ -2,7 +2,7 @@ use std::io::Cursor;

 use big_s::S;
 use heed::EnvOpenOptions;
-use maplit::{btreemap, hashset};
+use maplit::{hashmap, hashset};

 use crate::documents::{DocumentsBatchBuilder, DocumentsBatchReader};
 use crate::update::{IndexDocuments, IndexDocumentsConfig, IndexerConfig, Settings};
@@ -33,7 +33,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
        S("tag"),
        S("asc_desc_rank"),
    });
-    builder.set_synonyms(btreemap! {
+    builder.set_synonyms(hashmap! {
        S("hello") => vec![S("good morning")],
        S("world") => vec![S("earth")],
        S("america") => vec![S("the united states")],
--- a/milli/src/search/new/tests/proximity.rs
+++ b/milli/src/search/new/tests/proximity.rs
@@ -15,7 +15,7 @@ they store fewer sprximities than the regular word sprximity DB.

 */

-use std::collections::BTreeMap;
+use std::collections::HashMap;

 use crate::index::tests::TempIndex;
 use crate::search::new::tests::collect_field_values;
@@ -336,7 +336,7 @@ fn test_proximity_split_word() {

    index
        .update_settings(|s| {
-            let mut syns = BTreeMap::new();
+            let mut syns = HashMap::new();
            syns.insert("xyz".to_owned(), vec!["sun flower".to_owned()]);
            s.set_synonyms(syns);
        })
--- a/milli/src/search/new/tests/typo.rs
+++ b/milli/src/search/new/tests/typo.rs
@@ -18,7 +18,7 @@ if `words` doesn't exist before it.
 14. Synonyms cost nothing according to the typo ranking rule
 */

-use std::collections::BTreeMap;
+use std::collections::HashMap;

 use crate::index::tests::TempIndex;
 use crate::search::new::tests::collect_field_values;
@@ -591,7 +591,7 @@ fn test_typo_synonyms() {
        .update_settings(|s| {
            s.set_criteria(vec![Criterion::Typo]);

-            let mut synonyms = BTreeMap::new();
+            let mut synonyms = HashMap::new();
            synonyms.insert("lackadaisical".to_owned(), vec!["lazy".to_owned()]);
            synonyms.insert("fast brownish".to_owned(), vec!["quick brown".to_owned()]);

--- a/milli/src/update/clear_documents.rs
+++ b/milli/src/update/clear_documents.rs
@@ -15,8 +15,6 @@ impl<'t, 'u, 'i> ClearDocuments<'t, 'u, 'i> {
    }

    pub fn execute(self) -> Result<u64> {
-        puffin::profile_function!();
-
        self.index.set_updated_at(self.wtxn, &OffsetDateTime::now_utc())?;
        let Index {
            env: _env,
--- a/milli/src/update/delete_documents.rs
+++ b/milli/src/update/delete_documents.rs
@@ -109,8 +109,6 @@ impl<'t, 'u, 'i> DeleteDocuments<'t, 'u, 'i> {
        Some(docid)
    }
    pub fn execute(self) -> Result<DocumentDeletionResult> {
-        puffin::profile_function!();
-
        let DetailedDocumentDeletionResult { deleted_documents, remaining_documents } =
            self.execute_inner()?;

--- a/milli/src/update/facet/mod.rs
+++ b/milli/src/update/facet/mod.rs
@@ -94,7 +94,7 @@ use crate::heed_codec::facet::{FacetGroupKey, FacetGroupKeyCodec, FacetGroupValu
 use crate::heed_codec::ByteSliceRefCodec;
 use crate::update::index_documents::create_sorter;
 use crate::update::merge_btreeset_string;
-use crate::{BEU16StrCodec, Index, Result, BEU16};
+use crate::{BEU16StrCodec, Index, Result, BEU16, MAX_FACET_VALUE_LENGTH};

 pub mod bulk;
 pub mod delete;
@@ -191,7 +191,16 @@ impl<'i> FacetsUpdate<'i> {
        for result in database.iter(wtxn)? {
            let (facet_group_key, ()) = result?;
            if let FacetGroupKey { field_id, level: 0, left_bound } = facet_group_key {
-                let normalized_facet = left_bound.normalize(&options);
+                let mut normalized_facet = left_bound.normalize(&options);
+                let normalized_truncated_facet: String;
+                if normalized_facet.len() > MAX_FACET_VALUE_LENGTH {
+                    normalized_truncated_facet = normalized_facet
+                        .char_indices()
+                        .take_while(|(idx, _)| *idx < MAX_FACET_VALUE_LENGTH)
+                        .map(|(_, c)| c)
+                        .collect();
+                    normalized_facet = normalized_truncated_facet.into();
+                }
                let set = BTreeSet::from_iter(std::iter::once(left_bound));
                let key = (field_id, normalized_facet.as_ref());
                let key = BEU16StrCodec::bytes_encode(&key).ok_or(heed::Error::Encoding)?;
--- a/milli/src/update/index_documents/enrich.rs
+++ b/milli/src/update/index_documents/enrich.rs
@@ -31,8 +31,6 @@ pub fn enrich_documents_batch<R: Read + Seek>(
    autogenerate_docids: bool,
    reader: DocumentsBatchReader<R>,
 ) -> Result<StdResult<EnrichedDocumentsBatchReader<R>, UserError>> {
-    puffin::profile_function!();
-
    let (mut cursor, mut documents_batch_index) = reader.into_cursor_and_fields_index();

    let mut external_ids = tempfile::tempfile().map(grenad::Writer::new)?;
--- a/milli/src/update/index_documents/extract/extract_docid_word_positions.rs
+++ b/milli/src/update/index_documents/extract/extract_docid_word_positions.rs
@@ -28,12 +28,8 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
    indexer: GrenadParameters,
    searchable_fields: &Option<HashSet<FieldId>>,
    stop_words: Option<&fst::Set<&[u8]>>,
-    allowed_separators: Option<&Vec<&str>>,
-    dictionary: Option<&Vec<&str>>,
    max_positions_per_attributes: Option<u32>,
 ) -> Result<(RoaringBitmap, grenad::Reader<File>, ScriptLanguageDocidsMap)> {
-    puffin::profile_function!();
-
    let max_positions_per_attributes = max_positions_per_attributes
        .map_or(MAX_POSITION_PER_ATTRIBUTE, |max| max.min(MAX_POSITION_PER_ATTRIBUTE));
    let max_memory = indexer.max_memory_by_thread();
@@ -54,14 +50,6 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
    if let Some(stop_words) = stop_words {
        tokenizer_builder.stop_words(stop_words);
    }
-    if let Some(dictionary) = dictionary {
-        // let dictionary: Vec<_> = dictionary.iter().map(String::as_str).collect();
-        tokenizer_builder.words_dict(dictionary.as_slice());
-    }
-    if let Some(separators) = allowed_separators {
-        // let separators: Vec<_> = separators.iter().map(String::as_str).collect();
-        tokenizer_builder.separators(separators.as_slice());
-    }
    let tokenizer = tokenizer_builder.build();

    let mut cursor = obkv_documents.into_cursor()?;
--- a/milli/src/update/index_documents/extract/extract_facet_number_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_facet_number_docids.rs
@@ -20,8 +20,6 @@ pub fn extract_facet_number_docids<R: io::Read + io::Seek>(
    docid_fid_facet_number: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut facet_number_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/extract_facet_string_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_facet_string_docids.rs
@@ -18,8 +18,6 @@ pub fn extract_facet_string_docids<R: io::Read + io::Seek>(
    docid_fid_facet_string: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut facet_string_docids_sorter = create_sorter(
@@ -46,7 +44,7 @@ pub fn extract_facet_string_docids<R: io::Read + io::Seek>(
        if normalised_value.len() > MAX_FACET_VALUE_LENGTH {
            normalised_truncated_value = normalised_value
                .char_indices()
-                .take_while(|(idx, _)| idx + 4 < MAX_FACET_VALUE_LENGTH)
+                .take_while(|(idx, _)| *idx < MAX_FACET_VALUE_LENGTH)
                .map(|(_, c)| c)
                .collect();
            normalised_value = normalised_truncated_value.as_str();
--- a/milli/src/update/index_documents/extract/extract_fid_docid_facet_values.rs
+++ b/milli/src/update/index_documents/extract/extract_fid_docid_facet_values.rs
@@ -28,14 +28,14 @@ pub struct ExtractedFacetValues {
 ///
 /// Returns the generated grenad reader containing the docid the fid and the orginal value as key
 /// and the normalized value as value extracted from the given chunk of documents.
+/// We need the fid of the geofields to correctly parse them as numbers if they were sent as strings initially.
 #[logging_timer::time]
 pub fn extract_fid_docid_facet_values<R: io::Read + io::Seek>(
    obkv_documents: grenad::Reader<R>,
    indexer: GrenadParameters,
    faceted_fields: &HashSet<FieldId>,
+    geo_fields_ids: Option<(FieldId, FieldId)>,
 ) -> Result<ExtractedFacetValues> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut fid_docid_facet_numbers_sorter = create_sorter(
@@ -84,7 +84,10 @@ pub fn extract_fid_docid_facet_values<R: io::Read + io::Seek>(

                let value = from_slice(field_bytes).map_err(InternalError::SerdeJson)?;

-                match extract_facet_values(&value) {
+                match extract_facet_values(
+                    &value,
+                    geo_fields_ids.map_or(false, |(lat, lng)| field_id == lat || field_id == lng),
+                ) {
                    FilterableValues::Null => {
                        facet_is_null_docids.entry(field_id).or_default().insert(document);
                    }
@@ -177,12 +180,13 @@ enum FilterableValues {
    Values { numbers: Vec<f64>, strings: Vec<(String, String)> },
 }

-fn extract_facet_values(value: &Value) -> FilterableValues {
+fn extract_facet_values(value: &Value, geo_field: bool) -> FilterableValues {
    fn inner_extract_facet_values(
        value: &Value,
        can_recurse: bool,
        output_numbers: &mut Vec<f64>,
        output_strings: &mut Vec<(String, String)>,
+        geo_field: bool,
    ) {
        match value {
            Value::Null => (),
@@ -193,13 +197,30 @@ fn extract_facet_values(value: &Value) -> FilterableValues {
                }
            }
            Value::String(original) => {
+                // if we're working on a geofield it MUST be something we can parse or else there was an internal error
+                // in the enrich pipeline. But since the enrich pipeline worked, we want to avoid crashing at all costs.
+                if geo_field {
+                    if let Ok(float) = original.parse() {
+                        output_numbers.push(float);
+                    } else {
+                        log::warn!(
+                            "Internal error, could not parse a geofield that has been validated. Please open an issue."
+                        )
+                    }
+                }
                let normalized = crate::normalize_facet(original);
                output_strings.push((normalized, original.clone()));
            }
            Value::Array(values) => {
                if can_recurse {
                    for value in values {
-                        inner_extract_facet_values(value, false, output_numbers, output_strings);
+                        inner_extract_facet_values(
+                            value,
+                            false,
+                            output_numbers,
+                            output_strings,
+                            geo_field,
+                        );
                    }
                }
            }
@@ -215,7 +236,7 @@ fn extract_facet_values(value: &Value) -> FilterableValues {
        otherwise => {
            let mut numbers = Vec::new();
            let mut strings = Vec::new();
-            inner_extract_facet_values(otherwise, true, &mut numbers, &mut strings);
+            inner_extract_facet_values(otherwise, true, &mut numbers, &mut strings, geo_field);
            FilterableValues::Values { numbers, strings }
        }
    }
--- a/milli/src/update/index_documents/extract/extract_fid_word_count_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_fid_word_count_docids.rs
@@ -22,8 +22,6 @@ pub fn extract_fid_word_count_docids<R: io::Read + io::Seek>(
    docid_word_positions: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut fid_word_count_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/extract_geo_points.rs
+++ b/milli/src/update/index_documents/extract/extract_geo_points.rs
@@ -19,8 +19,6 @@ pub fn extract_geo_points<R: io::Read + io::Seek>(
    primary_key_id: FieldId,
    (lat_fid, lng_fid): (FieldId, FieldId),
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let mut writer = create_writer(
        indexer.chunk_compression_type,
        indexer.chunk_compression_level,
--- a/milli/src/update/index_documents/extract/extract_vector_points.rs
+++ b/milli/src/update/index_documents/extract/extract_vector_points.rs
@@ -19,8 +19,6 @@ pub fn extract_vector_points<R: io::Read + io::Seek>(
    primary_key_id: FieldId,
    vectors_fid: FieldId,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let mut writer = create_writer(
        indexer.chunk_compression_type,
        indexer.chunk_compression_level,
--- a/milli/src/update/index_documents/extract/extract_word_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_word_docids.rs
@@ -27,8 +27,6 @@ pub fn extract_word_docids<R: io::Read + io::Seek>(
    indexer: GrenadParameters,
    exact_attributes: &HashSet<FieldId>,
 ) -> Result<(grenad::Reader<File>, grenad::Reader<File>)> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut word_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/extract_word_fid_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_word_fid_docids.rs
@@ -15,8 +15,6 @@ pub fn extract_word_fid_docids<R: io::Read + io::Seek>(
    docid_word_positions: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut word_fid_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/extract_word_pair_proximity_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_word_pair_proximity_docids.rs
@@ -21,8 +21,6 @@ pub fn extract_word_pair_proximity_docids<R: io::Read + io::Seek>(
    docid_word_positions: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut word_pair_proximity_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/extract_word_position_docids.rs
+++ b/milli/src/update/index_documents/extract/extract_word_position_docids.rs
@@ -18,8 +18,6 @@ pub fn extract_word_position_docids<R: io::Read + io::Seek>(
    docid_word_positions: grenad::Reader<R>,
    indexer: GrenadParameters,
 ) -> Result<grenad::Reader<File>> {
-    puffin::profile_function!();
-
    let max_memory = indexer.max_memory_by_thread();

    let mut word_position_docids_sorter = create_sorter(
--- a/milli/src/update/index_documents/extract/mod.rs
+++ b/milli/src/update/index_documents/extract/mod.rs
@@ -49,13 +49,9 @@ pub(crate) fn data_from_obkv_documents(
    geo_fields_ids: Option<(FieldId, FieldId)>,
    vectors_field_id: Option<FieldId>,
    stop_words: Option<fst::Set<&[u8]>>,
-    allowed_separators: Option<Vec<&str>>,
-    dictionary: Option<Vec<&str>>,
    max_positions_per_attributes: Option<u32>,
    exact_attributes: HashSet<FieldId>,
 ) -> Result<()> {
-    puffin::profile_function!();
-
    original_obkv_chunks
        .par_bridge()
        .map(|original_documents_chunk| {
@@ -78,8 +74,6 @@ pub(crate) fn data_from_obkv_documents(
                    geo_fields_ids,
                    vectors_field_id,
                    &stop_words,
-                    &allowed_separators,
-                    &dictionary,
                    max_positions_per_attributes,
                )
            })
@@ -244,13 +238,11 @@ fn spawn_extraction_task<FE, FS, M>(
    M::Output: Send,
 {
    rayon::spawn(move || {
-        puffin::profile_scope!("extract_multiple_chunks", name);
        let chunks: Result<M> =
            chunks.into_par_iter().map(|chunk| extract_fn(chunk, indexer)).collect();
        rayon::spawn(move || match chunks {
            Ok(chunks) => {
                debug!("merge {} database", name);
-                puffin::profile_scope!("merge_multiple_chunks", name);
                let reader = chunks.merge(merge_fn, &indexer);
                let _ = lmdb_writer_sx.send(reader.map(serialize_fn));
            }
@@ -293,8 +285,6 @@ fn send_and_extract_flattened_documents_data(
    geo_fields_ids: Option<(FieldId, FieldId)>,
    vectors_field_id: Option<FieldId>,
    stop_words: &Option<fst::Set<&[u8]>>,
-    allowed_separators: &Option<Vec<&str>>,
-    dictionary: &Option<Vec<&str>>,
    max_positions_per_attributes: Option<u32>,
 ) -> Result<(
    grenad::Reader<CursorClonableMmap>,
@@ -350,8 +340,6 @@ fn send_and_extract_flattened_documents_data(
                        indexer,
                        searchable_fields,
                        stop_words.as_ref(),
-                        allowed_separators.as_ref(),
-                        dictionary.as_ref(),
                        max_positions_per_attributes,
                    )?;

@@ -378,6 +366,7 @@ fn send_and_extract_flattened_documents_data(
                    flattened_documents_chunk.clone(),
                    indexer,
                    faceted_fields,
+                    geo_fields_ids,
                )?;

                // send docid_fid_facet_numbers_chunk to DB writer
--- a/milli/src/update/index_documents/helpers/grenad_helpers.rs
+++ b/milli/src/update/index_documents/helpers/grenad_helpers.rs
@@ -214,7 +214,6 @@ pub fn sorter_into_lmdb_database(
    sorter: Sorter<MergeFn>,
    merge: MergeFn,
 ) -> Result<()> {
-    puffin::profile_function!();
    debug!("Writing MTBL sorter...");
    let before = Instant::now();

--- a/milli/src/update/index_documents/mod.rs
+++ b/milli/src/update/index_documents/mod.rs
@@ -137,8 +137,6 @@ where
        mut self,
        reader: DocumentsBatchReader<R>,
    ) -> Result<(Self, StdResult<u64, UserError>)> {
-        puffin::profile_function!();
-
        // Early return when there is no document to add
        if reader.is_empty() {
            return Ok((self, Ok(0)));
@@ -177,8 +175,6 @@ where
        mut self,
        to_delete: Vec<String>,
    ) -> Result<(Self, StdResult<u64, UserError>)> {
-        puffin::profile_function!();
-
        // Early return when there is no document to add
        if to_delete.is_empty() {
            return Ok((self, Ok(0)));
@@ -198,8 +194,6 @@ where

    #[logging_timer::time("IndexDocuments::{}")]
    pub fn execute(mut self) -> Result<DocumentAdditionResult> {
-        puffin::profile_function!();
-
        if self.added_documents == 0 {
            let number_of_documents = self.index.number_of_documents(self.wtxn)?;
            return Ok(DocumentAdditionResult { indexed_documents: 0, number_of_documents });
@@ -238,8 +232,6 @@ where
        FP: Fn(UpdateIndexingStep) + Sync,
        FA: Fn() -> bool + Sync,
    {
-        puffin::profile_function!();
-
        let TransformOutput {
            primary_key,
            fields_ids_map,
@@ -316,12 +308,6 @@ where
        let vectors_field_id = self.index.fields_ids_map(self.wtxn)?.id("_vectors");

        let stop_words = self.index.stop_words(self.wtxn)?;
-        let separators = self.index.allowed_separators(self.wtxn)?;
-        let separators: Option<Vec<_>> =
-            separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
-        let dictionary = self.index.dictionary(self.wtxn)?;
-        let dictionary: Option<Vec<_>> =
-            dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
        let exact_attributes = self.index.exact_attributes_ids(self.wtxn)?;

        let pool_params = GrenadParameters {
@@ -336,7 +322,6 @@ where

        // Run extraction pipeline in parallel.
        pool.install(|| {
-            puffin::profile_scope!("extract_and_send_grenad_chunks");
            // split obkv file into several chunks
            let original_chunk_iter =
                grenad_obkv_into_chunks(original_documents, pool_params, documents_chunk_size);
@@ -359,8 +344,6 @@ where
                    geo_fields_ids,
                    vectors_field_id,
                    stop_words,
-                    separators,
-                    dictionary,
                    max_positions_per_attributes,
                    exact_attributes,
                )
@@ -494,8 +477,6 @@ where
        FP: Fn(UpdateIndexingStep) + Sync,
        FA: Fn() -> bool + Sync,
    {
-        puffin::profile_function!();
-
        // Merged databases are already been indexed, we start from this count;
        let mut databases_seen = MERGED_DATABASE_COUNT;

@@ -530,36 +511,26 @@ where
            return Err(Error::InternalError(InternalError::AbortedIndexation));
        }

-        let current_prefix_fst;
-        let common_prefix_fst_words_tmp;
-        let common_prefix_fst_words: Vec<_>;
-        let new_prefix_fst_words;
-        let del_prefix_fst_words;
+        let current_prefix_fst = self.index.words_prefixes_fst(self.wtxn)?;

-        {
-            puffin::profile_scope!("compute_prefix_diffs");
+        // We retrieve the common words between the previous and new prefix word fst.
+        let common_prefix_fst_words = fst_stream_into_vec(
+            previous_words_prefixes_fst.op().add(&current_prefix_fst).intersection(),
+        );
+        let common_prefix_fst_words: Vec<_> = common_prefix_fst_words
+            .as_slice()
+            .linear_group_by_key(|x| x.chars().next().unwrap())
+            .collect();

-            current_prefix_fst = self.index.words_prefixes_fst(self.wtxn)?;
+        // We retrieve the newly added words between the previous and new prefix word fst.
+        let new_prefix_fst_words = fst_stream_into_vec(
+            current_prefix_fst.op().add(&previous_words_prefixes_fst).difference(),
+        );

-            // We retrieve the common words between the previous and new prefix word fst.
-            common_prefix_fst_words_tmp = fst_stream_into_vec(
-                previous_words_prefixes_fst.op().add(&current_prefix_fst).intersection(),
-            );
-            common_prefix_fst_words = common_prefix_fst_words_tmp
-                .as_slice()
-                .linear_group_by_key(|x| x.chars().next().unwrap())
-                .collect();
-
-            // We retrieve the newly added words between the previous and new prefix word fst.
-            new_prefix_fst_words = fst_stream_into_vec(
-                current_prefix_fst.op().add(&previous_words_prefixes_fst).difference(),
-            );
-
-            // We compute the set of prefixes that are no more part of the prefix fst.
-            del_prefix_fst_words = fst_stream_into_hashset(
-                previous_words_prefixes_fst.op().add(&current_prefix_fst).difference(),
-            );
-        }
+        // We compute the set of prefixes that are no more part of the prefix fst.
+        let del_prefix_fst_words = fst_stream_into_hashset(
+            previous_words_prefixes_fst.op().add(&current_prefix_fst).difference(),
+        );

        databases_seen += 1;
        (self.progress)(UpdateIndexingStep::MergeDataIntoFinalDatabase {
@@ -697,8 +668,6 @@ fn execute_word_prefix_docids(
    common_prefix_fst_words: &[&[String]],
    del_prefix_fst_words: &HashSet<Vec<u8>>,
 ) -> Result<()> {
-    puffin::profile_function!();
-
    let cursor = reader.into_cursor()?;
    let mut builder = WordPrefixDocids::new(txn, word_docids_db, word_prefix_docids_db);
    builder.chunk_compression_type = indexer_config.chunk_compression_type;
--- a/milli/src/update/index_documents/transform.rs
+++ b/milli/src/update/index_documents/transform.rs
@@ -558,8 +558,6 @@ impl<'a, 'i> Transform<'a, 'i> {
    where
        F: Fn(UpdateIndexingStep) + Sync,
    {
-        puffin::profile_function!();
-
        let primary_key = self
            .index
            .primary_key(wtxn)?
--- a/milli/src/update/index_documents/typed_chunk.rs
+++ b/milli/src/update/index_documents/typed_chunk.rs
@@ -46,66 +46,6 @@ pub(crate) enum TypedChunk {
    ScriptLanguageDocids(HashMap<(Script, Language), RoaringBitmap>),
 }

-impl TypedChunk {
-    pub fn to_debug_string(&self) -> String {
-        match self {
-            TypedChunk::FieldIdDocidFacetStrings(grenad) => {
-                format!("FieldIdDocidFacetStrings {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdDocidFacetNumbers(grenad) => {
-                format!("FieldIdDocidFacetNumbers {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::Documents(grenad) => {
-                format!("Documents {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdWordcountDocids(grenad) => {
-                format!("FieldIdWordcountDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::NewDocumentsIds(grenad) => {
-                format!("NewDocumentsIds {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::WordDocids { word_docids_reader, exact_word_docids_reader } => format!(
-                "WordDocids {{ word_docids_reader: {}, exact_word_docids_reader: {} }}",
-                word_docids_reader.len(),
-                exact_word_docids_reader.len()
-            ),
-            TypedChunk::WordPositionDocids(grenad) => {
-                format!("WordPositionDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::WordFidDocids(grenad) => {
-                format!("WordFidDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::WordPairProximityDocids(grenad) => {
-                format!("WordPairProximityDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdFacetStringDocids(grenad) => {
-                format!("FieldIdFacetStringDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdFacetNumberDocids(grenad) => {
-                format!("FieldIdFacetNumberDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdFacetExistsDocids(grenad) => {
-                format!("FieldIdFacetExistsDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdFacetIsNullDocids(grenad) => {
-                format!("FieldIdFacetIsNullDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::FieldIdFacetIsEmptyDocids(grenad) => {
-                format!("FieldIdFacetIsEmptyDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::GeoPoints(grenad) => {
-                format!("GeoPoints {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::VectorPoints(grenad) => {
-                format!("VectorPoints {{ number_of_entries: {} }}", grenad.len())
-            }
-            TypedChunk::ScriptLanguageDocids(grenad) => {
-                format!("ScriptLanguageDocids {{ number_of_entries: {} }}", grenad.len())
-            }
-        }
-    }
-}
-
 /// Write typed chunk in the corresponding LMDB database of the provided index.
 /// Return new documents seen.
 pub(crate) fn write_typed_chunk_into_index(
@@ -114,8 +54,6 @@ pub(crate) fn write_typed_chunk_into_index(
    wtxn: &mut RwTxn,
    index_is_empty: bool,
 ) -> Result<(RoaringBitmap, bool)> {
-    puffin::profile_function!(typed_chunk.to_debug_string());
-
    let mut is_merged_database = false;
    match typed_chunk {
        TypedChunk::Documents(obkv_documents_iter) => {
@@ -412,8 +350,6 @@ where
    FS: for<'a> Fn(&'a [u8], &'a mut Vec<u8>) -> Result<&'a [u8]>,
    FM: Fn(&[u8], &[u8], &mut Vec<u8>) -> Result<()>,
 {
-    puffin::profile_function!(format!("number of entries: {}", data.len()));
-
    let mut buffer = Vec::new();
    let database = database.remap_types::<ByteSlice, ByteSlice>();

@@ -456,8 +392,6 @@ where
    FS: for<'a> Fn(&'a [u8], &'a mut Vec<u8>) -> Result<&'a [u8]>,
    FM: Fn(&[u8], &[u8], &mut Vec<u8>) -> Result<()>,
 {
-    puffin::profile_function!(format!("number of entries: {}", data.len()));
-
    if !index_is_empty {
        return write_entries_into_database(
            data,
--- a/milli/src/update/prefix_word_pairs/mod.rs
+++ b/milli/src/update/prefix_word_pairs/mod.rs
@@ -50,8 +50,6 @@ impl<'t, 'u, 'i> PrefixWordPairsProximityDocids<'t, 'u, 'i> {
        common_prefix_fst_words: &[&'a [String]],
        del_prefix_fst_words: &HashSet<Vec<u8>>,
    ) -> Result<()> {
-        puffin::profile_function!();
-
        index_word_prefix_database(
            self.wtxn,
            self.index.word_pair_proximity_docids,
--- a/milli/src/update/prefix_word_pairs/prefix_word.rs
+++ b/milli/src/update/prefix_word_pairs/prefix_word.rs
@@ -27,8 +27,6 @@ pub fn index_prefix_word_database(
    chunk_compression_type: CompressionType,
    chunk_compression_level: Option<u32>,
 ) -> Result<()> {
-    puffin::profile_function!();
-
    let max_proximity = max_proximity - 1;
    debug!("Computing and writing the word prefix pair proximity docids into LMDB on disk...");

--- a/milli/src/update/prefix_word_pairs/word_prefix.rs
+++ b/milli/src/update/prefix_word_pairs/word_prefix.rs
@@ -191,7 +191,6 @@ pub fn index_word_prefix_database(
    chunk_compression_type: CompressionType,
    chunk_compression_level: Option<u32>,
 ) -> Result<()> {
-    puffin::profile_function!();
    debug!("Computing and writing the word prefix pair proximity docids into LMDB on disk...");

    // Make a prefix trie from the common prefixes that are shorter than self.max_prefix_length
--- a/milli/src/update/settings.rs
+++ b/milli/src/update/settings.rs
@@ -1,4 +1,4 @@
-use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
+use std::collections::{BTreeSet, HashMap, HashSet};
 use std::result::Result as StdResult;

 use charabia::{Normalize, Tokenizer, TokenizerBuilder};
@@ -112,11 +112,8 @@ pub struct Settings<'a, 't, 'u, 'i> {
    sortable_fields: Setting<HashSet<String>>,
    criteria: Setting<Vec<Criterion>>,
    stop_words: Setting<BTreeSet<String>>,
-    non_separator_tokens: Setting<BTreeSet<String>>,
-    separator_tokens: Setting<BTreeSet<String>>,
-    dictionary: Setting<BTreeSet<String>>,
    distinct_field: Setting<String>,
-    synonyms: Setting<BTreeMap<String, Vec<String>>>,
+    synonyms: Setting<HashMap<String, Vec<String>>>,
    primary_key: Setting<String>,
    authorize_typos: Setting<bool>,
    min_word_len_two_typos: Setting<u8>,
@@ -144,9 +141,6 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
            sortable_fields: Setting::NotSet,
            criteria: Setting::NotSet,
            stop_words: Setting::NotSet,
-            non_separator_tokens: Setting::NotSet,
-            separator_tokens: Setting::NotSet,
-            dictionary: Setting::NotSet,
            distinct_field: Setting::NotSet,
            synonyms: Setting::NotSet,
            primary_key: Setting::NotSet,
@@ -211,39 +205,6 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
            if stop_words.is_empty() { Setting::Reset } else { Setting::Set(stop_words) }
    }

-    pub fn reset_non_separator_tokens(&mut self) {
-        self.non_separator_tokens = Setting::Reset;
-    }
-
-    pub fn set_non_separator_tokens(&mut self, non_separator_tokens: BTreeSet<String>) {
-        self.non_separator_tokens = if non_separator_tokens.is_empty() {
-            Setting::Reset
-        } else {
-            Setting::Set(non_separator_tokens)
-        }
-    }
-
-    pub fn reset_separator_tokens(&mut self) {
-        self.separator_tokens = Setting::Reset;
-    }
-
-    pub fn set_separator_tokens(&mut self, separator_tokens: BTreeSet<String>) {
-        self.separator_tokens = if separator_tokens.is_empty() {
-            Setting::Reset
-        } else {
-            Setting::Set(separator_tokens)
-        }
-    }
-
-    pub fn reset_dictionary(&mut self) {
-        self.dictionary = Setting::Reset;
-    }
-
-    pub fn set_dictionary(&mut self, dictionary: BTreeSet<String>) {
-        self.dictionary =
-            if dictionary.is_empty() { Setting::Reset } else { Setting::Set(dictionary) }
-    }
-
    pub fn reset_distinct_field(&mut self) {
        self.distinct_field = Setting::Reset;
    }
@@ -256,7 +217,7 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
        self.synonyms = Setting::Reset;
    }

-    pub fn set_synonyms(&mut self, synonyms: BTreeMap<String, Vec<String>>) {
+    pub fn set_synonyms(&mut self, synonyms: HashMap<String, Vec<String>>) {
        self.synonyms = if synonyms.is_empty() { Setting::Reset } else { Setting::Set(synonyms) }
    }

@@ -342,8 +303,6 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
        FP: Fn(UpdateIndexingStep) + Sync,
        FA: Fn() -> bool + Sync,
    {
-        puffin::profile_function!();
-
        let fields_ids_map = self.index.fields_ids_map(self.wtxn)?;
        // if the settings are set before any document update, we don't need to do anything, and
        // will set the primary key during the first document addition.
@@ -491,84 +450,9 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
        }
    }

-    fn update_non_separator_tokens(&mut self) -> Result<bool> {
-        let changes = match self.non_separator_tokens {
-            Setting::Set(ref non_separator_tokens) => {
-                let current = self.index.non_separator_tokens(self.wtxn)?;
-
-                // Does the new list differ from the previous one?
-                if current.map_or(true, |current| &current != non_separator_tokens) {
-                    self.index.put_non_separator_tokens(self.wtxn, non_separator_tokens)?;
-                    true
-                } else {
-                    false
-                }
-            }
-            Setting::Reset => self.index.delete_non_separator_tokens(self.wtxn)?,
-            Setting::NotSet => false,
-        };
-
-        // the synonyms must be updated if non separator tokens have been updated.
-        if changes && self.synonyms == Setting::NotSet {
-            self.synonyms = Setting::Set(self.index.user_defined_synonyms(self.wtxn)?);
-        }
-
-        Ok(changes)
-    }
-
-    fn update_separator_tokens(&mut self) -> Result<bool> {
-        let changes = match self.separator_tokens {
-            Setting::Set(ref separator_tokens) => {
-                let current = self.index.separator_tokens(self.wtxn)?;
-
-                // Does the new list differ from the previous one?
-                if current.map_or(true, |current| &current != separator_tokens) {
-                    self.index.put_separator_tokens(self.wtxn, separator_tokens)?;
-                    true
-                } else {
-                    false
-                }
-            }
-            Setting::Reset => self.index.delete_separator_tokens(self.wtxn)?,
-            Setting::NotSet => false,
-        };
-
-        // the synonyms must be updated if separator tokens have been updated.
-        if changes && self.synonyms == Setting::NotSet {
-            self.synonyms = Setting::Set(self.index.user_defined_synonyms(self.wtxn)?);
-        }
-
-        Ok(changes)
-    }
-
-    fn update_dictionary(&mut self) -> Result<bool> {
-        let changes = match self.dictionary {
-            Setting::Set(ref dictionary) => {
-                let current = self.index.dictionary(self.wtxn)?;
-
-                // Does the new list differ from the previous one?
-                if current.map_or(true, |current| &current != dictionary) {
-                    self.index.put_dictionary(self.wtxn, dictionary)?;
-                    true
-                } else {
-                    false
-                }
-            }
-            Setting::Reset => self.index.delete_dictionary(self.wtxn)?,
-            Setting::NotSet => false,
-        };
-
-        // the synonyms must be updated if dictionary has been updated.
-        if changes && self.synonyms == Setting::NotSet {
-            self.synonyms = Setting::Set(self.index.user_defined_synonyms(self.wtxn)?);
-        }
-
-        Ok(changes)
-    }
-
    fn update_synonyms(&mut self) -> Result<bool> {
        match self.synonyms {
-            Setting::Set(ref user_synonyms) => {
+            Setting::Set(ref synonyms) => {
                fn normalize(tokenizer: &Tokenizer, text: &str) -> Vec<String> {
                    tokenizer
                        .tokenize(text)
@@ -587,25 +471,10 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
                if let Some(ref stop_words) = stop_words {
                    builder.stop_words(stop_words);
                }
-
-                let separators = self.index.allowed_separators(self.wtxn)?;
-                let separators: Option<Vec<_>> =
-                    separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
-                if let Some(ref separators) = separators {
-                    builder.separators(separators);
-                }
-
-                let dictionary = self.index.dictionary(self.wtxn)?;
-                let dictionary: Option<Vec<_>> =
-                    dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
-                if let Some(ref dictionary) = dictionary {
-                    builder.words_dict(dictionary);
-                }
-
                let tokenizer = builder.build();

                let mut new_synonyms = HashMap::new();
-                for (word, synonyms) in user_synonyms {
+                for (word, synonyms) in synonyms {
                    // Normalize both the word and associated synonyms.
                    let normalized_word = normalize(&tokenizer, word);
                    let normalized_synonyms =
@@ -626,7 +495,7 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
                let old_synonyms = self.index.synonyms(self.wtxn)?;

                if new_synonyms != old_synonyms {
-                    self.index.put_synonyms(self.wtxn, &new_synonyms, user_synonyms)?;
+                    self.index.put_synonyms(self.wtxn, &new_synonyms)?;
                    Ok(true)
                } else {
                    Ok(false)
@@ -886,17 +755,11 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
        let faceted_updated = old_faceted_fields != new_faceted_fields;

        let stop_words_updated = self.update_stop_words()?;
-        let non_separator_tokens_updated = self.update_non_separator_tokens()?;
-        let separator_tokens_updated = self.update_separator_tokens()?;
-        let dictionary_updated = self.update_dictionary()?;
        let synonyms_updated = self.update_synonyms()?;
        let searchable_updated = self.update_searchable()?;
        let exact_attributes_updated = self.update_exact_attributes()?;

        if stop_words_updated
-            || non_separator_tokens_updated
-            || separator_tokens_updated
-            || dictionary_updated
            || faceted_updated
            || synonyms_updated
            || searchable_updated
@@ -913,7 +776,7 @@ impl<'a, 't, 'u, 'i> Settings<'a, 't, 'u, 'i> {
 mod tests {
    use big_s::S;
    use heed::types::ByteSlice;
-    use maplit::{btreemap, btreeset, hashset};
+    use maplit::{btreeset, hashmap, hashset};

    use super::*;
    use crate::error::Error;
@@ -1379,7 +1242,7 @@ mod tests {
        // In the same transaction provide some synonyms
        index
            .update_settings_using_wtxn(&mut wtxn, |settings| {
-                settings.set_synonyms(btreemap! {
+                settings.set_synonyms(hashmap! {
                    "blini".to_string() => vec!["crepes".to_string()],
                    "super like".to_string() => vec!["love".to_string()],
                    "puppies".to_string() => vec!["dogs".to_string(), "doggos".to_string()]
@@ -1675,9 +1538,6 @@ mod tests {
                    sortable_fields,
                    criteria,
                    stop_words,
-                    non_separator_tokens,
-                    separator_tokens,
-                    dictionary,
                    distinct_field,
                    synonyms,
                    primary_key,
@@ -1696,9 +1556,6 @@ mod tests {
                assert!(matches!(sortable_fields, Setting::NotSet));
                assert!(matches!(criteria, Setting::NotSet));
                assert!(matches!(stop_words, Setting::NotSet));
-                assert!(matches!(non_separator_tokens, Setting::NotSet));
-                assert!(matches!(separator_tokens, Setting::NotSet));
-                assert!(matches!(dictionary, Setting::NotSet));
                assert!(matches!(distinct_field, Setting::NotSet));
                assert!(matches!(synonyms, Setting::NotSet));
                assert!(matches!(primary_key, Setting::NotSet));
--- a/milli/src/update/word_prefix_docids.rs
+++ b/milli/src/update/word_prefix_docids.rs
@@ -45,8 +45,6 @@ impl<'t, 'u, 'i> WordPrefixDocids<'t, 'u, 'i> {
        common_prefix_fst_words: &[&[String]],
        del_prefix_fst_words: &HashSet<Vec<u8>>,
    ) -> Result<()> {
-        puffin::profile_function!();
-
        // It is forbidden to keep a mutable reference into the database
        // and write into it at the same time, therefore we write into another file.
        let mut prefix_docids_sorter = create_sorter(
--- a/milli/src/update/words_prefix_integer_docids.rs
+++ b/milli/src/update/words_prefix_integer_docids.rs
@@ -50,7 +50,6 @@ impl<'t, 'u, 'i> WordPrefixIntegerDocids<'t, 'u, 'i> {
        common_prefix_fst_words: &[&[String]],
        del_prefix_fst_words: &HashSet<Vec<u8>>,
    ) -> Result<()> {
-        puffin::profile_function!();
        debug!("Computing and writing the word levels integers docids into LMDB on disk...");

        let mut prefix_integer_docids_sorter = create_sorter(
--- a/milli/src/update/words_prefixes_fst.rs
+++ b/milli/src/update/words_prefixes_fst.rs
@@ -42,8 +42,6 @@ impl<'t, 'u, 'i> WordsPrefixesFst<'t, 'u, 'i> {

    #[logging_timer::time("WordsPrefixesFst::{}")]
    pub fn execute(self) -> Result<()> {
-        puffin::profile_function!();
-
        let words_fst = self.index.words_fst(self.wtxn)?;

        let mut current_prefix = vec![SmallString32::new(); self.max_prefix_length];
--- a/milli/tests/search/mod.rs
+++ b/milli/tests/search/mod.rs
@@ -5,7 +5,7 @@ use std::io::Cursor;
 use big_s::S;
 use either::{Either, Left, Right};
 use heed::EnvOpenOptions;
-use maplit::{btreemap, hashset};
+use maplit::{hashmap, hashset};
 use milli::documents::{DocumentsBatchBuilder, DocumentsBatchReader};
 use milli::update::{IndexDocuments, IndexDocumentsConfig, IndexerConfig, Settings};
 use milli::{AscDesc, Criterion, DocumentId, Index, Member, Object, TermsMatchingStrategy};
@@ -51,7 +51,7 @@ pub fn setup_search_index_with_criteria(criteria: &[Criterion]) -> Index {
        S("tag"),
        S("asc_desc_rank"),
    });
-    builder.set_synonyms(btreemap! {
+    builder.set_synonyms(hashmap! {
        S("hello") => vec![S("good morning")],
        S("world") => vec![S("earth")],
        S("america") => vec![S("the united states")],
Author	SHA1	Message	Date
ManyTheFish	46c117d9b8	Merge remote-tracking branch 'origin/release-v1.3.1' into japanese-docker-image	2023-08-10 13:59:33 +02:00
meili-bors[bot]	ef3d098b4d	Merge #3976 3976: Fix the get stats method r=ManyTheFish a=irevoire # Pull Request - The get stats method of the index-scheduler was not using at all the processing tasks. That was returning a wrong number of enqueued tasks and 0 processing tasks. - Added a test - Currently this method was ONLY used to compute the `meilisearch_nb_tasks` field of the experimental feature metrics. ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3972 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-08-10 10:55:50 +00:00
meili-bors[bot]	44c1900f36	Merge #3986 3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire # Pull Request When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document. This PR fixes this issue by automatically parsing the string value in case we're working on a geofield. ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3973 ## What does this PR do? - Automatically parse the facet value iif we're working on a geofield. - Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`). - Add one integration test in milli and in meilisearch to ensure it works forever. - Add three snapshots for the dump that mysteriously disappeared I don't know how Co-authored-by: Tamo <tamo@meilisearch.com>	2023-08-09 07:58:15 +00:00
meili-bors[bot]	04671d0751	Merge #3981 3981: Truncate the normalized long facets used in the search for facet value r=irevoire a=ManyTheFish # Pull Request Truncate the normalized long facets used in the search for facet value ## targeted release v1.3.1 ## Related issue Fixes #3978 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-08-08 15:07:07 +00:00
Tamo	4f4c669d50	add back some dump snapshots that disappeared. it's completely unrelated to this PR	2023-08-08 16:58:14 +02:00
ManyTheFish	35758db9ec	Truncate the the normalized long facets used in search for facet value	2023-08-08 16:38:30 +02:00
Tamo	4988199bb9	ensure the geoboundingbox works with strings and int geofields in milli and meilisearch	2023-08-08 16:29:25 +02:00
Tamo	83991ee770	enable the multi-snapshot attribute in insta. This will let us use insta in loops	2023-08-08 16:28:38 +02:00
Tamo	9d061cec26	automatically parse the filterable attribute to float if it's a geo field	2023-08-08 16:28:07 +02:00
Tamo	fe819a9d80	fix the get stats method It was not taking into account the processing tasks at all	2023-08-08 13:21:15 +02:00
meili-bors[bot]	e338ceb97f	Merge #3982 3982: Update version for the next release (v1.3.1) in Cargo.toml r=irevoire a=meili-bot ⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging. Co-authored-by: irevoire <irevoire@users.noreply.github.com>	2023-08-08 10:30:56 +00:00
irevoire	75c87d5391	Update version for the next release (v1.3.1) in Cargo.toml	2023-08-08 10:30:06 +00:00
ManyTheFish	425c88af7e	Merge branch 'release-v1.3.0' into japanese-docker-image	2023-07-31 16:02:58 +02:00
ManyTheFish	40024e5307	Activate only the necessary features for Japanese	2023-07-03 18:53:59 +02:00