Lazily compute the FSTs during indexing

Merge pull request #5313 from barloes/fixRankingScoreThresholdRankingIssue
fix for rankingScoreThreshold changes the results' ranking
2025-07-18 12:20:48 +00:00 · 2025-04-03 16:04:35 +02:00 · 2025-04-01 13:10:55 +00:00 · 2025-04-01 14:25:27 +02:00 · 2025-03-31 15:31:29 +00:00 · 2025-03-31 15:27:49 +02:00
37 changed files with 865 additions and 395 deletions
--- a/.github/workflows/test-suite.yml
+++ b/.github/workflows/test-suite.yml
@ -6,11 +6,7 @@ on:
    # Everyday at 5:00am
    - cron: "0 5 * * *"
  pull_request:
-  push:
-    # trying and staging branches are for Bors config
-    branches:
-      - trying
-      - staging
+  merge_group:

 env:
  CARGO_TERM_COLOR: always
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -150,7 +150,7 @@ Some notes on GitHub PRs:
 - The PR title should be accurate and descriptive of the changes.
 - [Convert your PR as a draft](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request) if your changes are a work in progress: no one will review it until you pass your PR as ready for review.<br>
  The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
+- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [GitHub Merge Queues](https://github.blog/news-insights/product-news/github-merge-queue-is-generally-available/) to automatically enforce this requirement without the PR author having to rebase manually.

 ## Release Process (for internal team only)

@ -158,8 +158,7 @@ Meilisearch tools follow the [Semantic Versioning Convention](https://semver.org

 ### Automation to rebase and Merge the PRs

-This project integrates a bot that helps us manage pull requests merging.<br>
-_[Read more about this](https://github.com/meilisearch/integration-guides/blob/main/resources/bors.md)._
+This project uses GitHub Merge Queues that helps us manage pull requests merging.

 ### How to Publish a new Release

--- a/Cargo.lock
+++ b/Cargo.lock
@ -258,7 +258,7 @@ version = "0.7.8"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "891477e0c6a8957309ee5c45a6368af3ae14bb510732d2684ffa19af310920f9"
 dependencies = [
- "getrandom",
+ "getrandom 0.2.15",
 "once_cell",
 "version_check",
 ]
@ -271,7 +271,7 @@ checksum = "e89da841a80418a9b391ebaea17f5c112ffaaa96f621d2c285b5174da76b9011"
 dependencies = [
 "cfg-if",
 "const-random",
- "getrandom",
+ "getrandom 0.2.15",
 "once_cell",
 "version_check",
 "zerocopy",
@ -790,22 +790,20 @@ dependencies = [

 [[package]]
 name = "bzip2"
-version = "0.4.4"
+version = "0.5.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "bdb116a6ef3f6c3698828873ad02c3014b3c85cadb88496095628e3ef1e347f8"
+checksum = "49ecfb22d906f800d4fe833b6282cf4dc1c298f5057ca0b5445e5c209735ca47"
 dependencies = [
 "bzip2-sys",
- "libc",
 ]

 [[package]]
 name = "bzip2-sys"
-version = "0.1.11+1.0.8"
+version = "0.1.13+1.0.8"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "736a955f3fa7875102d57c82b8cac37ec45224a07fd32d58f9f7a186b6cd4cdc"
+checksum = "225bff33b2141874fe80d71e07d6eec4f85c5c216453dd96388240f96e1acc14"
 dependencies = [
 "cc",
- "libc",
 "pkg-config",
 ]

@ -978,9 +976,9 @@ dependencies = [

 [[package]]
 name = "charabia"
-version = "0.9.2"
+version = "0.9.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "cf8921fe4d53ab8f9e8f9b72ce6f91726cfc40fffab1243d27db406b5e2e9cc2"
+checksum = "650d52f87a36472ea1c803dee49d6bfd23d426efa9363e2f4c4a0e6a236d3407"
 dependencies = [
 "aho-corasick",
 "csv",
@ -1143,7 +1141,7 @@ version = "0.1.16"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "f9d839f2a20b0aee515dc581a6172f2321f96cab76c1a38a4c584a194955390e"
 dependencies = [
- "getrandom",
+ "getrandom 0.2.15",
 "once_cell",
 "tiny-keccak",
 ]
@ -2216,10 +2214,24 @@ dependencies = [
 "cfg-if",
 "js-sys",
 "libc",
- "wasi",
+ "wasi 0.11.0+wasi-snapshot-preview1",
 "wasm-bindgen",
 ]

+[[package]]
+name = "getrandom"
+version = "0.3.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "43a49c392881ce6d5c3b8cb70f98717b7c07aabbdff06687b9030dbfbe2725f8"
+dependencies = [
+ "cfg-if",
+ "js-sys",
+ "libc",
+ "wasi 0.13.3+wasi-0.2.2",
+ "wasm-bindgen",
+ "windows-targets 0.52.6",
+]
+
 [[package]]
 name = "gimli"
 version = "0.27.3"
@ -2733,6 +2745,7 @@ dependencies = [
 "bincode",
 "bumpalo",
 "bumparaw-collections",
+ "byte-unit",
 "convert_case 0.6.0",
 "crossbeam-channel",
 "csv",
@ -2741,6 +2754,7 @@ dependencies = [
 "enum-iterator",
 "file-store",
 "flate2",
+ "indexmap",
 "insta",
 "maplit",
 "meili-snap",
@ -2923,10 +2937,11 @@ dependencies = [

 [[package]]
 name = "js-sys"
-version = "0.3.69"
+version = "0.3.77"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "29c15563dc2726973df627357ce0c9ddddbea194836909d655df6a75d2cf296d"
+checksum = "1cfaf33c695fc6e08064efbc1f72ec937429614f25eef83af942d0e227c3a28f"
 dependencies = [
+ "once_cell",
 "wasm-bindgen",
 ]

@ -3062,9 +3077,9 @@ dependencies = [

 [[package]]
 name = "lindera"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "c6cbc1aad631a7da0a7e9bc4b8669fa92ac9ca8eeb7b35a807376dd3034443ff"
+checksum = "832c220475557e3b44a46cad1862b57f010f0c6e93d771d0e628e08689c068b1"
 dependencies = [
 "lindera-analyzer",
 "lindera-core",
@ -3075,9 +3090,9 @@ dependencies = [

 [[package]]
 name = "lindera-analyzer"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "74508ffbb24e36905d1718b261460e378a748029b07bcd7e06f0d18500b8194c"
+checksum = "a8e26651714abf5167e6b6a80f5cdaa0cad41c5fcb84d8ba96bebafcb9029339"
 dependencies = [
 "anyhow",
 "bincode",
@ -3105,9 +3120,9 @@ dependencies = [

 [[package]]
 name = "lindera-assets"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "6a677c371ecb3bd02b751be306ea09876cd47cf426303ad5f10a3fd6f9a4ded6"
+checksum = "ebb01f1ca53c1e642234c6c7fdb9ac664ad0c1ab9502f33e4200201bac7e6ce7"
 dependencies = [
 "encoding",
 "flate2",
@ -3118,9 +3133,9 @@ dependencies = [

 [[package]]
 name = "lindera-cc-cedict"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "c35944000d05a177e981f037b5f0805f283b32f05a0c35713003bef136ca8cb4"
+checksum = "5f7618d9aa947fdd7c38eae2b79f0fd237ecb5067608f1363610ba20d20ab5a8"
 dependencies = [
 "bincode",
 "byteorder",
@ -3132,9 +3147,9 @@ dependencies = [

 [[package]]
 name = "lindera-cc-cedict-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "85b8f642bc9c9130682569975772a17336c6aab26d11fc0f823f3e663167ace6"
+checksum = "efdbcb809d81428935d601a78c94bfb39500749213f7320705f427a7a1d31aec"
 dependencies = [
 "anyhow",
 "lindera-core",
@ -3144,9 +3159,9 @@ dependencies = [

 [[package]]
 name = "lindera-compress"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "a7825d8d63592aa5727d67bd209170ac82df56c369533efbf0ddbac277bb68ec"
+checksum = "eac178afa2456dac469d3b1a2d7fbaf3e1ea796a1f52321e8ac29545a53c239c"
 dependencies = [
 "anyhow",
 "flate2",
@ -3155,9 +3170,9 @@ dependencies = [

 [[package]]
 name = "lindera-core"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0c28191456debc98af6aa5f7db77872471983e9fa2a737b1c232b6ef543aed62"
+checksum = "649777465f48147ce593ab6db347e235e3af8f693a23f4437be94a1cdbdf5fdf"
 dependencies = [
 "anyhow",
 "bincode",
@ -3172,9 +3187,9 @@ dependencies = [

 [[package]]
 name = "lindera-decompress"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "4788a1ead2f63f3fc2888109272921dedd86a87b7d0bf05e9daab46600daac51"
+checksum = "9e3faaceb85e43ac250021866c6db3cdc9997b44b3d3ea498594d04edc91fc45"
 dependencies = [
 "anyhow",
 "flate2",
@ -3183,9 +3198,9 @@ dependencies = [

 [[package]]
 name = "lindera-dictionary"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "bdf5f91725e32b9a21b1656baa7030766c9bafc4de4b4ddeb8ffdde7224dd2f6"
+checksum = "31e15b2d2d8a4ad45f2e373a084931cf3dfbde15f124044e2436bb920af3366c"
 dependencies = [
 "anyhow",
 "bincode",
@ -3208,9 +3223,9 @@ dependencies = [

 [[package]]
 name = "lindera-dictionary-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "e41f00ba7ac541b0ffd8c30e7a73f2dd197546cc5780462ec4f2e4782945a780"
+checksum = "59802949110545b59b663917ed3fd55dc3b3a8cde6bd20137d7fe24372cfb9aa"
 dependencies = [
 "anyhow",
 "bincode",
@ -3230,9 +3245,9 @@ dependencies = [

 [[package]]
 name = "lindera-filter"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "273d27e01e1377e2647314a4a5b9bdca4b52a867b319069ebae8c10191146eca"
+checksum = "1320f118c3fc9e897f4ebfc16864e5ef8c0b06ba769c0a50e53f193f9d682bf8"
 dependencies = [
 "anyhow",
 "csv",
@ -3255,9 +3270,9 @@ dependencies = [

 [[package]]
 name = "lindera-ipadic"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "b97a52ff0af5acb700093badaf7078051ab9ffd9071859724445a60193995f1f"
+checksum = "5b4731bf3730f1f38266d7ee9bca7d460cd336645c9dfd4e6a1082e58ab1e993"
 dependencies = [
 "bincode",
 "byteorder",
@ -3269,9 +3284,9 @@ dependencies = [

 [[package]]
 name = "lindera-ipadic-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "bf5031c52686128db13f774b2c5a8abfd52b4cc1f904041d8411aa19d630ce4d"
+checksum = "309966c12e682f67205c3cd3c8dc55bbdcd1eb3b5c7c5cb41fb8acd18906d340"
 dependencies = [
 "anyhow",
 "lindera-core",
@ -3281,9 +3296,9 @@ dependencies = [

 [[package]]
 name = "lindera-ipadic-neologd"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "d6b36764b27b169aa11d24888141f206a6c246a5b195c1e67127485bac512fb6"
+checksum = "e90e919b4cfb9962d24ee1e1d50a7c163bbf356376495ad66d1996e20b9f9e44"
 dependencies = [
 "bincode",
 "byteorder",
@ -3295,9 +3310,9 @@ dependencies = [

 [[package]]
 name = "lindera-ipadic-neologd-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "abf36e40ace904741efdd883ed5c4dba6425f65156a0fb5d3f73a386335950dc"
+checksum = "7e517df0d501f9f8bf3126da20fc8cb9a5e37921e0eec1824d7a62f096463e02"
 dependencies = [
 "anyhow",
 "lindera-core",
@ -3307,9 +3322,9 @@ dependencies = [

 [[package]]
 name = "lindera-ko-dic"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "4c92a1a3564b531953f0238cbcea392f2905f7b27b449978cf9e702a80e1086d"
+checksum = "e9c6da4e68bc8b452a54b96d65361ebdceb4b6f36ecf262425c0e1f77960ae82"
 dependencies = [
 "bincode",
 "byteorder",
@ -3322,9 +3337,9 @@ dependencies = [

 [[package]]
 name = "lindera-ko-dic-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "9f2c60425abc1548570c2568858f74a1f042105ecd89faa39c651b4315350fd9"
+checksum = "afc95884cc8f6dfb176caf5991043a4acf94c359215bbd039ea765e00454f271"
 dependencies = [
 "anyhow",
 "lindera-core",
@ -3334,9 +3349,9 @@ dependencies = [

 [[package]]
 name = "lindera-tokenizer"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "903e558981bcb6f59870aa7d6b4bcb09e8f7db778886a6a70f67fd74c9fa2ca3"
+checksum = "d122042e1232a55c3604692445952a134e523822e9b4b9ab32a53ff890037ad4"
 dependencies = [
 "bincode",
 "lindera-core",
@ -3348,9 +3363,9 @@ dependencies = [

 [[package]]
 name = "lindera-unidic"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "d227c3ce9cbd905f865c46c65a0470fd04e89b71104d7f92baa71a212ffe1d4b"
+checksum = "cbffae1fb2f2614abdcb50f99b138476dbac19862ffa57bfdc9c7b5d5b22a90c"
 dependencies = [
 "bincode",
 "byteorder",
@ -3363,9 +3378,9 @@ dependencies = [

 [[package]]
 name = "lindera-unidic-builder"
-version = "0.32.2"
+version = "0.32.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "99e2c50015c242e02c451acb6748667ac6fd1d3d667cd7db48cd89e2f2d2377e"
+checksum = "fe50055327712ebd1bcc74b657cf78c728a78b9586e3f99d5dd0b6a0be221c5d"
 dependencies = [
 "anyhow",
 "lindera-core",
@ -3518,6 +3533,17 @@ dependencies = [
 "crc",
 ]

+[[package]]
+name = "lzma-sys"
+version = "0.1.20"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5fda04ab3764e6cde78b9974eec4f779acaba7c4e84b36eca3cf77c581b85d27"
+dependencies = [
+ "cc",
+ "libc",
+ "pkg-config",
+]
+
 [[package]]
 name = "macro_rules_attribute"
 version = "0.2.0"
@ -3656,7 +3682,7 @@ dependencies = [
 "uuid",
 "wiremock",
 "yaup",
- "zip 2.2.2",
+ "zip 2.3.0",
 ]

 [[package]]
@ -3882,7 +3908,7 @@ checksum = "a4a650543ca06a924e8b371db273b2756685faae30f8487da1b56505a8f78b0c"
 dependencies = [
 "libc",
 "log",
- "wasi",
+ "wasi 0.11.0+wasi-snapshot-preview1",
 "windows-sys 0.48.0",
 ]

@ -3893,7 +3919,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "2886843bf800fba2e3377cff24abf6379b4c4d5c6681eaf9ea5b0d15090450bd"
 dependencies = [
 "libc",
- "wasi",
+ "wasi 0.11.0+wasi-snapshot-preview1",
 "windows-sys 0.52.0",
 ]

@ -4670,7 +4696,7 @@ version = "0.6.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "ec0be4795e2f6a28069bec0b5ff3e2ac9bafc99e6a9a7dc3547996c5c816922c"
 dependencies = [
- "getrandom",
+ "getrandom 0.2.15",
 ]

 [[package]]
@ -4762,7 +4788,7 @@ version = "0.4.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "b033d837a7cf162d7993aded9304e30a83213c648b6e389db233191f891e5c2b"
 dependencies = [
- "getrandom",
+ "getrandom 0.2.15",
 "redox_syscall 0.2.16",
 "thiserror 1.0.69",
 ]
@ -4886,13 +4912,13 @@ dependencies = [

 [[package]]
 name = "ring"
-version = "0.17.13"
+version = "0.17.14"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "70ac5d832aa16abd7d1def883a8545280c20a60f523a370aa3a9617c2b8550ee"
+checksum = "a4689e6c2294d81e88dc6261c768b63bc4fcdb852be6d1352498b114f61383b7"
 dependencies = [
 "cc",
 "cfg-if",
- "getrandom",
+ "getrandom 0.2.15",
 "libc",
 "untrusted",
 "windows-sys 0.52.0",
@ -5576,7 +5602,7 @@ checksum = "9a8a559c81686f576e8cd0290cd2a24a2a9ad80c98b3478856500fcbd7acd704"
 dependencies = [
 "cfg-if",
 "fastrand",
- "getrandom",
+ "getrandom 0.2.15",
 "once_cell",
 "rustix",
 "windows-sys 0.52.0",
@ -5751,7 +5777,7 @@ dependencies = [
 "aho-corasick",
 "derive_builder 0.12.0",
 "esaxx-rs",
- "getrandom",
+ "getrandom 0.2.15",
 "itertools 0.12.1",
 "lazy_static",
 "log",
@ -6094,9 +6120,9 @@ checksum = "3354b9ac3fae1ff6755cb6db53683adb661634f67557942dea4facebec0fee4b"

 [[package]]
 name = "unicode-normalization"
-version = "0.1.23"
+version = "0.1.24"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "a56d1686db2308d901306f92a263857ef59ea39678a5458e7cb17f01415101f5"
+checksum = "5033c97c4262335cded6d6fc3e5c18ab755e1a3dc96376350f3d8e9f009ad956"
 dependencies = [
 "tinyvec",
 ]
@ -6238,7 +6264,7 @@ version = "1.11.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "f8c5f0a0af699448548ad1a2fbf920fb4bee257eae39953ba95cb84891a0446a"
 dependencies = [
- "getrandom",
+ "getrandom 0.2.15",
 "serde",
 ]

@ -6335,24 +6361,34 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "9c8d87e72b64a3b4db28d11ce29237c246188f4f51057d65a7eab63b7987e423"

 [[package]]
-name = "wasm-bindgen"
-version = "0.2.92"
+name = "wasi"
+version = "0.13.3+wasi-0.2.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "4be2531df63900aeb2bca0daaaddec08491ee64ceecbee5076636a3b026795a8"
+checksum = "26816d2e1a4a36a2940b96c5296ce403917633dff8f3440e9b236ed6f6bacad2"
+dependencies = [
+ "wit-bindgen-rt",
+]
+
+[[package]]
+name = "wasm-bindgen"
+version = "0.2.100"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1edc8929d7499fc4e8f0be2262a241556cfc54a0bea223790e71446f2aab1ef5"
 dependencies = [
 "cfg-if",
+ "once_cell",
+ "rustversion",
 "wasm-bindgen-macro",
 ]

 [[package]]
 name = "wasm-bindgen-backend"
-version = "0.2.92"
+version = "0.2.100"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "614d787b966d3989fa7bb98a654e369c762374fd3213d212cfc0251257e747da"
+checksum = "2f0a0651a5c2bc21487bde11ee802ccaf4c51935d0d3d42a6101f98161700bc6"
 dependencies = [
 "bumpalo",
 "log",
- "once_cell",
 "proc-macro2",
 "quote",
 "syn 2.0.87",
@ -6373,9 +6409,9 @@ dependencies = [

 [[package]]
 name = "wasm-bindgen-macro"
-version = "0.2.92"
+version = "0.2.100"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "a1f8823de937b71b9460c0c34e25f3da88250760bec0ebac694b49997550d726"
+checksum = "7fe63fc6d09ed3792bd0897b314f53de8e16568c2b3f7982f468c0bf9bd0b407"
 dependencies = [
 "quote",
 "wasm-bindgen-macro-support",
@ -6383,9 +6419,9 @@ dependencies = [

 [[package]]
 name = "wasm-bindgen-macro-support"
-version = "0.2.92"
+version = "0.2.100"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "e94f17b526d0a461a191c78ea52bbce64071ed5c04c9ffe424dcb38f74171bb7"
+checksum = "8ae87ea40c9f689fc23f209965b6fb8a99ad69aeeb0231408be24920604395de"
 dependencies = [
 "proc-macro2",
 "quote",
@ -6396,9 +6432,12 @@ dependencies = [

 [[package]]
 name = "wasm-bindgen-shared"
-version = "0.2.92"
+version = "0.2.100"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "af190c94f2773fdb3729c55b007a722abb5384da03bc0986df4c289bf5567e96"
+checksum = "1a05d73b933a847d6cccdda8f838a22ff101ad9bf93e33684f39c1f5f0eece3d"
+dependencies = [
+ "unicode-ident",
+]

 [[package]]
 name = "wasm-streams"
@ -6803,6 +6842,15 @@ dependencies = [
 "url",
 ]

+[[package]]
+name = "wit-bindgen-rt"
+version = "0.33.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3268f3d866458b787f390cf61f4bbb563b922d091359f9608842999eaee3943c"
+dependencies = [
+ "bitflags 2.9.0",
+]
+
 [[package]]
 name = "write16"
 version = "1.0.0"
@ -6858,6 +6906,15 @@ dependencies = [
 "uuid",
 ]

+[[package]]
+name = "xz2"
+version = "0.1.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "388c44dc09d76f1536602ead6d325eb532f5c122f17782bd57fb47baeeb767e2"
+dependencies = [
+ "lzma-sys",
+]
+
 [[package]]
 name = "yada"
 version = "0.5.1"
@ -6999,9 +7056,9 @@ dependencies = [

 [[package]]
 name = "zip"
-version = "2.2.2"
+version = "2.3.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "ae9c1ea7b3a5e1f4b922ff856a129881167511563dc219869afe3787fc0c1a45"
+checksum = "84e9a772a54b54236b9b744aaaf8d7be01b4d6e99725523cb82cb32d1c81b1d7"
 dependencies = [
 "aes",
 "arbitrary",
@ -7012,15 +7069,16 @@ dependencies = [
 "deflate64",
 "displaydoc",
 "flate2",
+ "getrandom 0.3.1",
 "hmac",
 "indexmap",
 "lzma-rs",
 "memchr",
 "pbkdf2",
- "rand",
 "sha1",
 "thiserror 2.0.9",
 "time",
+ "xz2",
 "zeroize",
 "zopfli",
 "zstd",
--- a/README.md
+++ b/README.md
@ -20,7 +20,7 @@
 <p align="center">
  <a href="https://deps.rs/repo/github/meilisearch/meilisearch"><img src="https://deps.rs/repo/github/meilisearch/meilisearch/status.svg" alt="Dependency status"></a>
  <a href="https://github.com/meilisearch/meilisearch/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-informational" alt="License"></a>
-  <a href="https://ms-bors.herokuapp.com/repositories/52"><img src="https://bors.tech/images/badge_small.svg" alt="Bors enabled"></a>
+  <a href="https://github.com/meilisearch/meilisearch/queue"><img alt="Merge Queues enabled" src="https://img.shields.io/badge/Merge_Queues-enabled-%2357cf60?logo=github"></a>
 </p>

 <p align="center">⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍</p>
--- a/bors.toml
+++ b/bors.toml
@ -1,10 +0,0 @@
-status = [
-    'Tests on ubuntu-22.04',
-    'Tests on macos-13',
-    'Tests on windows-2022',
-    'Run Clippy',
-    'Run Rustfmt',
-    'Run tests in debug',
-]
-# 3 hours timeout
-timeout-sec = 10800
--- a/crates/dump/src/lib.rs
+++ b/crates/dump/src/lib.rs
@ -326,6 +326,7 @@ pub(crate) mod test {
                index_uids: maplit::btreemap! { "doggo".to_string() => 1 },
                progress_trace: Default::default(),
                write_channel_congestion: None,
+                internal_database_sizes: Default::default(),
            },
            enqueued_at: Some(BatchEnqueuedAt {
                earliest: datetime!(2022-11-11 0:00 UTC),
--- a/crates/index-scheduler/Cargo.toml
+++ b/crates/index-scheduler/Cargo.toml
@ -13,6 +13,7 @@ license.workspace = true
 [dependencies]
 anyhow = "1.0.95"
 bincode = "1.3.3"
+byte-unit = "5.1.6"
 bumpalo = "3.16.0"
 bumparaw-collections = "0.1.4"
 convert_case = "0.6.0"
@ -22,6 +23,7 @@ dump = { path = "../dump" }
 enum-iterator = "2.1.0"
 file-store = { path = "../file-store" }
 flate2 = "1.0.35"
+indexmap = "2.7.0"
 meilisearch-auth = { path = "../meilisearch-auth" }
 meilisearch-types = { path = "../meilisearch-types" }
 memmap2 = "0.9.5"
--- a/crates/index-scheduler/src/insta_snapshot.rs
+++ b/crates/index-scheduler/src/insta_snapshot.rs
@ -344,6 +344,7 @@ pub fn snapshot_batch(batch: &Batch) -> String {
    let Batch { uid, details, stats, started_at, finished_at, progress: _, enqueued_at } = batch;
    let stats = BatchStats {
        progress_trace: Default::default(),
+        internal_database_sizes: Default::default(),
        write_channel_congestion: None,
        ..stats.clone()
    };
--- a/crates/index-scheduler/src/processing.rs
+++ b/crates/index-scheduler/src/processing.rs
@ -64,6 +64,13 @@ make_enum_progress! {
    }
 }

+make_enum_progress! {
+    pub enum FinalizingIndexStep {
+        Committing,
+        ComputingStats,
+    }
+}
+
 make_enum_progress! {
    pub enum TaskCancelationProgress {
        RetrievingTasks,
--- a/crates/index-scheduler/src/scheduler/mod.rs
+++ b/crates/index-scheduler/src/scheduler/mod.rs
@ -20,10 +20,12 @@ use std::path::PathBuf;
 use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
 use std::sync::Arc;

+use convert_case::{Case, Casing as _};
 use meilisearch_types::error::ResponseError;
 use meilisearch_types::heed::{Env, WithoutTls};
 use meilisearch_types::milli;
 use meilisearch_types::tasks::Status;
+use process_batch::ProcessBatchInfo;
 use rayon::current_num_threads;
 use rayon::iter::{IntoParallelIterator, ParallelIterator};
 use roaring::RoaringBitmap;
@ -223,16 +225,16 @@ impl IndexScheduler {
        let mut stop_scheduler_forever = false;
        let mut wtxn = self.env.write_txn().map_err(Error::HeedTransaction)?;
        let mut canceled = RoaringBitmap::new();
-        let mut congestion = None;
+        let mut process_batch_info = ProcessBatchInfo::default();

        match res {
-            Ok((tasks, cong)) => {
+            Ok((tasks, info)) => {
                #[cfg(test)]
                self.breakpoint(crate::test_utils::Breakpoint::ProcessBatchSucceeded);

                let (task_progress, task_progress_obj) = AtomicTaskStep::new(tasks.len() as u32);
                progress.update_progress(task_progress_obj);
-                congestion = cong;
+                process_batch_info = info;
                let mut success = 0;
                let mut failure = 0;
                let mut canceled_by = None;
@ -350,6 +352,9 @@ impl IndexScheduler {
        // We must re-add the canceled task so they're part of the same batch.
        ids |= canceled;

+        let ProcessBatchInfo { congestion, pre_commit_dabases_sizes, post_commit_dabases_sizes } =
+            process_batch_info;
+
        processing_batch.stats.progress_trace =
            progress.accumulated_durations().into_iter().map(|(k, v)| (k, v.into())).collect();
        processing_batch.stats.write_channel_congestion = congestion.map(|congestion| {
@ -359,6 +364,33 @@ impl IndexScheduler {
            congestion_info.insert("blocking_ratio".into(), congestion.congestion_ratio().into());
            congestion_info
        });
+        processing_batch.stats.internal_database_sizes = pre_commit_dabases_sizes
+            .iter()
+            .flat_map(|(dbname, pre_size)| {
+                post_commit_dabases_sizes
+                    .get(dbname)
+                    .map(|post_size| {
+                        use byte_unit::{Byte, UnitType::Binary};
+                        use std::cmp::Ordering::{Equal, Greater, Less};
+
+                        let post = Byte::from_u64(*post_size as u64).get_appropriate_unit(Binary);
+                        let diff_size = post_size.abs_diff(*pre_size) as u64;
+                        let diff = Byte::from_u64(diff_size).get_appropriate_unit(Binary);
+                        let sign = match post_size.cmp(pre_size) {
+                            Equal => return None,
+                            Greater => "+",
+                            Less => "-",
+                        };
+
+                        Some((
+                            dbname.to_case(Case::Camel),
+                            format!("{post:#.2} ({sign}{diff:#.2})").into(),
+                        ))
+                    })
+                    .into_iter()
+                    .flatten()
+            })
+            .collect();

        if let Some(congestion) = congestion {
            tracing::debug!(
--- a/crates/index-scheduler/src/scheduler/process_batch.rs
+++ b/crates/index-scheduler/src/scheduler/process_batch.rs
@ -12,7 +12,7 @@ use roaring::RoaringBitmap;

 use super::create_batch::Batch;
 use crate::processing::{
-    AtomicBatchStep, AtomicTaskStep, CreateIndexProgress, DeleteIndexProgress,
+    AtomicBatchStep, AtomicTaskStep, CreateIndexProgress, DeleteIndexProgress, FinalizingIndexStep,
    InnerSwappingTwoIndexes, SwappingTheIndexes, TaskCancelationProgress, TaskDeletionProgress,
    UpdateIndexProgress,
 };
@ -22,6 +22,16 @@ use crate::utils::{
 };
 use crate::{Error, IndexScheduler, Result, TaskId};

+#[derive(Debug, Default)]
+pub struct ProcessBatchInfo {
+    /// The write channel congestion. None when unavailable: settings update.
+    pub congestion: Option<ChannelCongestion>,
+    /// The sizes of the different databases before starting the indexation.
+    pub pre_commit_dabases_sizes: indexmap::IndexMap<&'static str, usize>,
+    /// The sizes of the different databases after commiting the indexation.
+    pub post_commit_dabases_sizes: indexmap::IndexMap<&'static str, usize>,
+}
+
 impl IndexScheduler {
    /// Apply the operation associated with the given batch.
    ///
@ -35,7 +45,7 @@ impl IndexScheduler {
        batch: Batch,
        current_batch: &mut ProcessingBatch,
        progress: Progress,
-    ) -> Result<(Vec<Task>, Option<ChannelCongestion>)> {
+    ) -> Result<(Vec<Task>, ProcessBatchInfo)> {
        #[cfg(test)]
        {
            self.maybe_fail(crate::test_utils::FailureLocation::InsideProcessBatch)?;
@ -76,7 +86,7 @@ impl IndexScheduler {

                canceled_tasks.push(task);

-                Ok((canceled_tasks, None))
+                Ok((canceled_tasks, ProcessBatchInfo::default()))
            }
            Batch::TaskDeletions(mut tasks) => {
                // 1. Retrieve the tasks that matched the query at enqueue-time.
@ -115,14 +125,14 @@ impl IndexScheduler {
                        _ => unreachable!(),
                    }
                }
-                Ok((tasks, None))
-            }
-            Batch::SnapshotCreation(tasks) => {
-                self.process_snapshot(progress, tasks).map(|tasks| (tasks, None))
-            }
-            Batch::Dump(task) => {
-                self.process_dump_creation(progress, task).map(|tasks| (tasks, None))
+                Ok((tasks, ProcessBatchInfo::default()))
            }
+            Batch::SnapshotCreation(tasks) => self
+                .process_snapshot(progress, tasks)
+                .map(|tasks| (tasks, ProcessBatchInfo::default())),
+            Batch::Dump(task) => self
+                .process_dump_creation(progress, task)
+                .map(|tasks| (tasks, ProcessBatchInfo::default())),
            Batch::IndexOperation { op, must_create_index } => {
                let index_uid = op.index_uid().to_string();
                let index = if must_create_index {
@ -139,10 +149,12 @@ impl IndexScheduler {
                    .set_currently_updating_index(Some((index_uid.clone(), index.clone())));

                let mut index_wtxn = index.write_txn()?;
+                let pre_commit_dabases_sizes = index.database_sizes(&index_wtxn)?;
                let (tasks, congestion) =
-                    self.apply_index_operation(&mut index_wtxn, &index, op, progress)?;
+                    self.apply_index_operation(&mut index_wtxn, &index, op, &progress)?;

                {
+                    progress.update_progress(FinalizingIndexStep::Committing);
                    let span = tracing::trace_span!(target: "indexing::scheduler", "commit");
                    let _entered = span.enter();

@ -153,12 +165,15 @@ impl IndexScheduler {
                // stats of the index. Since the tasks have already been processed and
                // this is a non-critical operation. If it fails, we should not fail
                // the entire batch.
+                let mut post_commit_dabases_sizes = None;
                let res = || -> Result<()> {
+                    progress.update_progress(FinalizingIndexStep::ComputingStats);
                    let index_rtxn = index.read_txn()?;
                    let stats = crate::index_mapper::IndexStats::new(&index, &index_rtxn)
                        .map_err(|e| Error::from_milli(e, Some(index_uid.to_string())))?;
                    let mut wtxn = self.env.write_txn()?;
                    self.index_mapper.store_stats_of(&mut wtxn, &index_uid, &stats)?;
+                    post_commit_dabases_sizes = Some(index.database_sizes(&index_rtxn)?);
                    wtxn.commit()?;
                    Ok(())
                }();
@ -171,7 +186,16 @@ impl IndexScheduler {
                    ),
                }

-                Ok((tasks, congestion))
+                let info = ProcessBatchInfo {
+                    congestion,
+                    // In case we fail to the get post-commit sizes we decide
+                    // that nothing changed and use the pre-commit sizes.
+                    post_commit_dabases_sizes: post_commit_dabases_sizes
+                        .unwrap_or_else(|| pre_commit_dabases_sizes.clone()),
+                    pre_commit_dabases_sizes,
+                };
+
+                Ok((tasks, info))
            }
            Batch::IndexCreation { index_uid, primary_key, task } => {
                progress.update_progress(CreateIndexProgress::CreatingTheIndex);
@ -239,7 +263,7 @@ impl IndexScheduler {
                    ),
                }

-                Ok((vec![task], None))
+                Ok((vec![task], ProcessBatchInfo::default()))
            }
            Batch::IndexDeletion { index_uid, index_has_been_created, mut tasks } => {
                progress.update_progress(DeleteIndexProgress::DeletingTheIndex);
@ -273,7 +297,9 @@ impl IndexScheduler {
                    };
                }

-                Ok((tasks, None))
+                // Here we could also show that all the internal database sizes goes to 0
+                // but it would mean opening the index and that's costly.
+                Ok((tasks, ProcessBatchInfo::default()))
            }
            Batch::IndexSwap { mut task } => {
                progress.update_progress(SwappingTheIndexes::EnsuringCorrectnessOfTheSwap);
@ -321,7 +347,7 @@ impl IndexScheduler {
                }
                wtxn.commit()?;
                task.status = Status::Succeeded;
-                Ok((vec![task], None))
+                Ok((vec![task], ProcessBatchInfo::default()))
            }
            Batch::UpgradeDatabase { mut tasks } => {
                let KindWithContent::UpgradeDatabase { from } = tasks.last().unwrap().kind else {
@ -351,7 +377,7 @@ impl IndexScheduler {
                    task.error = None;
                }

-                Ok((tasks, None))
+                Ok((tasks, ProcessBatchInfo::default()))
            }
        }
    }
--- a/crates/index-scheduler/src/scheduler/process_index_operation.rs
+++ b/crates/index-scheduler/src/scheduler/process_index_operation.rs
@ -32,7 +32,7 @@ impl IndexScheduler {
        index_wtxn: &mut RwTxn<'i>,
        index: &'i Index,
        operation: IndexOperation,
-        progress: Progress,
+        progress: &Progress,
    ) -> Result<(Vec<Task>, Option<ChannelCongestion>)> {
        let indexer_alloc = Bump::new();
        let started_processing_at = std::time::Instant::now();
@ -186,7 +186,7 @@ impl IndexScheduler {
                            &document_changes,
                            embedders,
                            &|| must_stop_processing.get(),
-                            &progress,
+                            progress,
                        )
                        .map_err(|e| Error::from_milli(e, Some(index_uid.clone())))?,
                    );
@ -307,7 +307,7 @@ impl IndexScheduler {
                            &document_changes,
                            embedders,
                            &|| must_stop_processing.get(),
-                            &progress,
+                            progress,
                        )
                        .map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
                    );
@ -465,7 +465,7 @@ impl IndexScheduler {
                            &document_changes,
                            embedders,
                            &|| must_stop_processing.get(),
-                            &progress,
+                            progress,
                        )
                        .map_err(|err| Error::from_milli(err, Some(index_uid.clone())))?,
                    );
@ -520,7 +520,7 @@ impl IndexScheduler {
                        index_uid: index_uid.clone(),
                        tasks: cleared_tasks,
                    },
-                    progress.clone(),
+                    progress,
                )?;

                let (settings_tasks, _congestion) = self.apply_index_operation(
--- a/crates/meilisearch-types/src/batches.rs
+++ b/crates/meilisearch-types/src/batches.rs
@ -64,4 +64,6 @@ pub struct BatchStats {
    pub progress_trace: serde_json::Map<String, serde_json::Value>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub write_channel_congestion: Option<serde_json::Map<String, serde_json::Value>>,
+    #[serde(default, skip_serializing_if = "serde_json::Map::is_empty")]
+    pub internal_database_sizes: serde_json::Map<String, serde_json::Value>,
 }
--- a/crates/meilisearch-types/src/error.rs
+++ b/crates/meilisearch-types/src/error.rs
@ -454,7 +454,10 @@ impl ErrorCode for milli::Error {
                    }
                    UserError::CriterionError(_) => Code::InvalidSettingsRankingRules,
                    UserError::InvalidGeoField { .. } => Code::InvalidDocumentGeoField,
-                    UserError::InvalidVectorDimensions { .. } => Code::InvalidVectorDimensions,
+                    UserError::InvalidVectorDimensions { .. }
+                    | UserError::InvalidIndexingVectorDimensions { .. } => {
+                        Code::InvalidVectorDimensions
+                    }
                    UserError::InvalidVectorsMapType { .. }
                    | UserError::InvalidVectorsEmbedderConf { .. } => Code::InvalidVectorsType,
                    UserError::TooManyVectors(_, _) => Code::TooManyVectors,
--- a/crates/meilisearch/Cargo.toml
+++ b/crates/meilisearch/Cargo.toml
@ -30,11 +30,7 @@ actix-web = { version = "4.9.0", default-features = false, features = [
 anyhow = { version = "1.0.95", features = ["backtrace"] }
 async-trait = "0.1.85"
 bstr = "1.11.3"
-byte-unit = { version = "5.1.6", default-features = false, features = [
-    "std",
-    "byte",
-    "serde",
-] }
+byte-unit = { version = "5.1.6", features = ["serde"] }
 bytes = "1.9.0"
 clap = { version = "4.5.24", features = ["derive", "env"] }
 crossbeam-channel = "0.5.14"
@ -140,7 +136,7 @@ reqwest = { version = "0.12.12", features = [
 sha-1 = { version = "0.10.1", optional = true }
 static-files = { version = "0.2.4", optional = true }
 tempfile = { version = "3.15.0", optional = true }
-zip = { version = "2.2.2", optional = true }
+zip = { version = "2.3.0", optional = true }

 [features]
 default = ["meilisearch-types/all-tokenizations", "mini-dashboard"]
@ -170,5 +166,5 @@ german = ["meilisearch-types/german"]
 turkish = ["meilisearch-types/turkish"]

 [package.metadata.mini-dashboard]
-assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.18/build.zip"
-sha1 = "b408a30dcb6e20cddb0c153c23385bcac4c8e912"
+assets-url = "https://github.com/meilisearch/mini-dashboard/releases/download/v0.2.19/build.zip"
+sha1 = "7974430d5277c97f67cf6e95eec6faaac2788834"
--- a/crates/meilisearch/src/routes/indexes/mod.rs
+++ b/crates/meilisearch/src/routes/indexes/mod.rs
@ -518,7 +518,7 @@ impl From<index_scheduler::IndexStats> for IndexStats {
                .inner_stats
                .number_of_documents
                .unwrap_or(stats.inner_stats.documents_database_stats.number_of_entries()),
-            raw_document_db_size: stats.inner_stats.documents_database_stats.total_value_size(),
+            raw_document_db_size: stats.inner_stats.documents_database_stats.total_size(),
            avg_document_size: stats.inner_stats.documents_database_stats.average_value_size(),
            is_indexing: stats.is_indexing,
            number_of_embeddings: stats.inner_stats.number_of_embeddings,
--- a/crates/meilisearch/tests/batches/mod.rs
+++ b/crates/meilisearch/tests/batches/mod.rs
@ -281,7 +281,8 @@ async fn test_summarized_document_addition_or_update() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r###"
    {
@ -303,7 +304,8 @@ async fn test_summarized_document_addition_or_update() {
          "test": 1
        },
        "progressTrace": "[progressTrace]",
-        "writeChannelCongestion": "[writeChannelCongestion]"
+        "writeChannelCongestion": "[writeChannelCongestion]",
+        "internalDatabaseSizes": "[internalDatabaseSizes]"
      },
      "duration": "[duration]",
      "startedAt": "[date]",
@ -322,7 +324,8 @@ async fn test_summarized_document_addition_or_update() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r###"
    {
@ -407,7 +410,8 @@ async fn test_summarized_delete_documents_by_batch() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r###"
    {
@ -495,7 +499,8 @@ async fn test_summarized_delete_documents_by_filter() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r###"
    {
@ -537,7 +542,8 @@ async fn test_summarized_delete_documents_by_filter() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r#"
    {
@ -623,7 +629,8 @@ async fn test_summarized_delete_document_by_id() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r#"
    {
@ -679,7 +686,8 @@ async fn test_summarized_settings_update() {
            ".startedAt" => "[date]",
            ".finishedAt" => "[date]",
            ".stats.progressTrace" => "[progressTrace]",
-            ".stats.writeChannelCongestion" => "[writeChannelCongestion]"
+            ".stats.writeChannelCongestion" => "[writeChannelCongestion]",
+            ".stats.internalDatabaseSizes" => "[internalDatabaseSizes]"
        },
        @r###"
    {
--- a/crates/meilisearch/tests/documents/delete_documents.rs
+++ b/crates/meilisearch/tests/documents/delete_documents.rs
@ -157,11 +157,14 @@ async fn delete_document_by_filter() {
    index.wait_task(task.uid()).await.succeeded();

    let (stats, _) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 4,
-      "rawDocumentDbSize": 42,
-      "avgDocumentSize": 10,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -208,11 +211,14 @@ async fn delete_document_by_filter() {
    "###);

    let (stats, _) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 16,
-      "avgDocumentSize": 8,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -278,11 +284,14 @@ async fn delete_document_by_filter() {
    "###);

    let (stats, _) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 1,
-      "rawDocumentDbSize": 12,
-      "avgDocumentSize": 12,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
--- a/crates/meilisearch/tests/dumps/mod.rs
+++ b/crates/meilisearch/tests/dumps/mod.rs
@ -28,12 +28,15 @@ async fn import_dump_v1_movie_raw() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -185,12 +188,15 @@ async fn import_dump_v1_movie_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-        json_string!(stats),
+        json_string!(stats, {
+            ".rawDocumentDbSize" => "[size]",
+            ".avgDocumentSize" => "[size]",
+        }),
        @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -355,12 +361,15 @@ async fn import_dump_v1_rubygems_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 8606,
-      "avgDocumentSize": 162,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -522,12 +531,15 @@ async fn import_dump_v2_movie_raw() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -679,12 +691,15 @@ async fn import_dump_v2_movie_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -846,12 +861,15 @@ async fn import_dump_v2_rubygems_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 8606,
-      "avgDocumentSize": 162,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1010,12 +1028,15 @@ async fn import_dump_v3_movie_raw() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1167,12 +1188,15 @@ async fn import_dump_v3_movie_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1334,12 +1358,15 @@ async fn import_dump_v3_rubygems_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 8606,
-      "avgDocumentSize": 162,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1498,12 +1525,15 @@ async fn import_dump_v4_movie_raw() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1655,12 +1685,15 @@ async fn import_dump_v4_movie_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 21965,
-      "avgDocumentSize": 414,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1822,12 +1855,15 @@ async fn import_dump_v4_rubygems_with_settings() {
    let (stats, code) = index.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 53,
-      "rawDocumentDbSize": 8606,
-      "avgDocumentSize": 162,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -1994,11 +2030,14 @@ async fn import_dump_v5() {

    let (stats, code) = index1.stats().await;
    snapshot!(code, @"200 OK");
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 10,
-      "rawDocumentDbSize": 6782,
-      "avgDocumentSize": 678,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -2031,12 +2070,15 @@ async fn import_dump_v5() {
    let (stats, code) = index2.stats().await;
    snapshot!(code, @"200 OK");
    snapshot!(
-      json_string!(stats),
+      json_string!(stats, {
+          ".rawDocumentDbSize" => "[size]",
+          ".avgDocumentSize" => "[size]",
+      }),
      @r###"
    {
      "numberOfDocuments": 10,
-      "rawDocumentDbSize": 6782,
-      "avgDocumentSize": 678,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -2237,6 +2279,7 @@ async fn import_dump_v6_containing_batches_and_enqueued_tasks() {
        ".results[0].duration" => "[date]",
        ".results[0].stats.progressTrace" => "[progressTrace]",
        ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]",
+        ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]",
    }), name: "batches");

    let (indexes, code) = server.list_indexes(None, None).await;
--- a/crates/meilisearch/tests/stats/mod.rs
+++ b/crates/meilisearch/tests/stats/mod.rs
@ -110,11 +110,14 @@ async fn add_remove_embeddings() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 27,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 5,
      "numberOfEmbeddedDocuments": 2,
@ -135,11 +138,14 @@ async fn add_remove_embeddings() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 27,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 3,
      "numberOfEmbeddedDocuments": 2,
@ -160,11 +166,14 @@ async fn add_remove_embeddings() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 27,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 2,
      "numberOfEmbeddedDocuments": 2,
@ -186,11 +195,14 @@ async fn add_remove_embeddings() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 27,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 2,
      "numberOfEmbeddedDocuments": 1,
@ -236,11 +248,14 @@ async fn add_remove_embedded_documents() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 27,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 5,
      "numberOfEmbeddedDocuments": 2,
@ -257,11 +272,14 @@ async fn add_remove_embedded_documents() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 1,
-      "rawDocumentDbSize": 13,
-      "avgDocumentSize": 13,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 3,
      "numberOfEmbeddedDocuments": 1,
@ -290,11 +308,14 @@ async fn update_embedder_settings() {
    index.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 108,
-      "avgDocumentSize": 54,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
@ -326,11 +347,14 @@ async fn update_embedder_settings() {
    server.wait_task(response.uid()).await.succeeded();

    let (stats, _code) = index.stats().await;
-    snapshot!(json_string!(stats), @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[size]",
+        ".avgDocumentSize" => "[size]",
+    }), @r###"
    {
      "numberOfDocuments": 2,
-      "rawDocumentDbSize": 108,
-      "avgDocumentSize": 54,
+      "rawDocumentDbSize": "[size]",
+      "avgDocumentSize": "[size]",
      "isIndexing": false,
      "numberOfEmbeddings": 3,
      "numberOfEmbeddedDocuments": 2,
--- a/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
+++ b/crates/meilisearch/tests/upgrade/v1_12/v1_12_0.rs
@ -133,7 +133,9 @@ async fn check_the_index_scheduler(server: &Server) {
    let (stats, _) = server.stats().await;
    assert_json_snapshot!(stats, {
        ".databaseSize" => "[bytes]",
-        ".usedDatabaseSize" => "[bytes]"
+        ".usedDatabaseSize" => "[bytes]",
+        ".indexes.kefir.rawDocumentDbSize" => "[bytes]",
+        ".indexes.kefir.avgDocumentSize" => "[bytes]",
    },
    @r###"
    {
@ -143,8 +145,8 @@ async fn check_the_index_scheduler(server: &Server) {
      "indexes": {
        "kefir": {
          "numberOfDocuments": 1,
-          "rawDocumentDbSize": 109,
-          "avgDocumentSize": 109,
+          "rawDocumentDbSize": "[bytes]",
+          "avgDocumentSize": "[bytes]",
          "isIndexing": false,
          "numberOfEmbeddings": 0,
          "numberOfEmbeddedDocuments": 0,
@ -193,31 +195,33 @@ async fn check_the_index_scheduler(server: &Server) {

    // Tests all the batches query parameters
    let (batches, _) = server.batches_filter("uids=10").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_uids_equal_10");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_uids_equal_10");
    let (batches, _) = server.batches_filter("batchUids=10").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_batchUids_equal_10");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_batchUids_equal_10");
    let (batches, _) = server.batches_filter("statuses=canceled").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_statuses_equal_canceled");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_statuses_equal_canceled");
    // types has already been tested above to retrieve the upgrade database
    let (batches, _) = server.batches_filter("canceledBy=19").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_canceledBy_equal_19");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_canceledBy_equal_19");
    let (batches, _) = server.batches_filter("beforeEnqueuedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeEnqueuedAt_equal_2025-01-16T16_47_41");
    let (batches, _) = server.batches_filter("afterEnqueuedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterEnqueuedAt_equal_2025-01-16T16_47_41");
    let (batches, _) = server.batches_filter("beforeStartedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeStartedAt_equal_2025-01-16T16_47_41");
    let (batches, _) = server.batches_filter("afterStartedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterStartedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterStartedAt_equal_2025-01-16T16_47_41");
    let (batches, _) = server.batches_filter("beforeFinishedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_beforeFinishedAt_equal_2025-01-16T16_47_41");
    let (batches, _) = server.batches_filter("afterFinishedAt=2025-01-16T16:47:41Z").await;
-    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41");
+    snapshot!(json_string!(batches, { ".results[0].duration" => "[duration]", ".results[0].enqueuedAt" => "[date]", ".results[0].startedAt" => "[date]", ".results[0].finishedAt" => "[date]", ".results[0].stats.progressTrace" => "[progressTrace]", ".results[0].stats.internalDatabaseSizes" => "[internalDatabaseSizes]", ".results[0].stats.writeChannelCongestion" => "[writeChannelCongestion]" }), name: "batches_filter_afterFinishedAt_equal_2025-01-16T16_47_41");

    let (stats, _) = server.stats().await;
    assert_json_snapshot!(stats, {
        ".databaseSize" => "[bytes]",
-        ".usedDatabaseSize" => "[bytes]"
+        ".usedDatabaseSize" => "[bytes]",
+        ".indexes.kefir.rawDocumentDbSize" => "[bytes]",
+        ".indexes.kefir.avgDocumentSize" => "[bytes]",
    },
    @r###"
    {
@ -227,8 +231,8 @@ async fn check_the_index_scheduler(server: &Server) {
      "indexes": {
        "kefir": {
          "numberOfDocuments": 1,
-          "rawDocumentDbSize": 109,
-          "avgDocumentSize": 109,
+          "rawDocumentDbSize": "[bytes]",
+          "avgDocumentSize": "[bytes]",
          "isIndexing": false,
          "numberOfEmbeddings": 0,
          "numberOfEmbeddedDocuments": 0,
@ -245,11 +249,14 @@ async fn check_the_index_scheduler(server: &Server) {
    "###);
    let index = server.index("kefir");
    let (stats, _) = index.stats().await;
-    snapshot!(stats, @r###"
+    snapshot!(json_string!(stats, {
+        ".rawDocumentDbSize" => "[bytes]",
+        ".avgDocumentSize" => "[bytes]",
+    }), @r###"
    {
      "numberOfDocuments": 1,
-      "rawDocumentDbSize": 109,
-      "avgDocumentSize": 109,
+      "rawDocumentDbSize": "[bytes]",
+      "avgDocumentSize": "[bytes]",
      "isIndexing": false,
      "numberOfEmbeddings": 0,
      "numberOfEmbeddedDocuments": 0,
--- a/crates/meilisearch/tests/vector/mod.rs
+++ b/crates/meilisearch/tests/vector/mod.rs
@ -164,6 +164,87 @@ async fn add_remove_user_provided() {
    "###);
 }

+#[actix_rt::test]
+async fn user_provide_mismatched_embedding_dimension() {
+    let server = Server::new().await;
+    let index = server.index("doggo");
+
+    let (response, code) = index
+        .update_settings(json!({
+          "embedders": {
+              "manual": {
+                  "source": "userProvided",
+                  "dimensions": 3,
+              }
+          },
+        }))
+        .await;
+    snapshot!(code, @"202 Accepted");
+    server.wait_task(response.uid()).await.succeeded();
+
+    let documents = json!([
+      {"id": 0, "name": "kefir", "_vectors": { "manual": [0, 0] }},
+    ]);
+    let (value, code) = index.add_documents(documents, None).await;
+    snapshot!(code, @"202 Accepted");
+    let task = index.wait_task(value.uid()).await;
+    snapshot!(task, @r###"
+    {
+      "uid": "[uid]",
+      "batchUid": "[batch_uid]",
+      "indexUid": "doggo",
+      "status": "failed",
+      "type": "documentAdditionOrUpdate",
+      "canceledBy": null,
+      "details": {
+        "receivedDocuments": 1,
+        "indexedDocuments": 0
+      },
+      "error": {
+        "message": "Index `doggo`: Invalid vector dimensions in document with id `0` in `._vectors.manual`.\n  - note: embedding #0 has dimensions 2\n  - note: embedder `manual` requires 3",
+        "code": "invalid_vector_dimensions",
+        "type": "invalid_request",
+        "link": "https://docs.meilisearch.com/errors#invalid_vector_dimensions"
+      },
+      "duration": "[duration]",
+      "enqueuedAt": "[date]",
+      "startedAt": "[date]",
+      "finishedAt": "[date]"
+    }
+    "###);
+
+    let new_document = json!([
+      {"id": 0, "name": "kefir", "_vectors": { "manual": [[0, 0], [1, 1], [2, 2]] }},
+    ]);
+    let (response, code) = index.add_documents(new_document, None).await;
+    snapshot!(code, @"202 Accepted");
+    let task = index.wait_task(response.uid()).await;
+    snapshot!(task, @r###"
+    {
+      "uid": "[uid]",
+      "batchUid": "[batch_uid]",
+      "indexUid": "doggo",
+      "status": "failed",
+      "type": "documentAdditionOrUpdate",
+      "canceledBy": null,
+      "details": {
+        "receivedDocuments": 1,
+        "indexedDocuments": 0
+      },
+      "error": {
+        "message": "Index `doggo`: Invalid vector dimensions in document with id `0` in `._vectors.manual`.\n  - note: embedding #0 has dimensions 2\n  - note: embedder `manual` requires 3",
+        "code": "invalid_vector_dimensions",
+        "type": "invalid_request",
+        "link": "https://docs.meilisearch.com/errors#invalid_vector_dimensions"
+      },
+      "duration": "[duration]",
+      "enqueuedAt": "[date]",
+      "startedAt": "[date]",
+      "finishedAt": "[date]"
+    }
+    "###);
+}
+
 async fn generate_default_user_provided_documents(server: &Server) -> Index {
    let index = server.index("doggo");

--- a/crates/milli/Cargo.toml
+++ b/crates/milli/Cargo.toml
@ -18,7 +18,7 @@ bincode = "1.3.3"
 bstr = "1.11.3"
 bytemuck = { version = "1.21.0", features = ["extern_crate_alloc"] }
 byteorder = "1.5.0"
-charabia = { version = "0.9.2", default-features = false }
+charabia = { version = "0.9.3", default-features = false }
 concat-arrays = "0.1.2"
 convert_case = "0.6.0"
 crossbeam-channel = "0.5.14"
--- a/crates/milli/src/database_stats.rs
+++ b/crates/milli/src/database_stats.rs
@ -1,8 +1,13 @@
-use heed::types::Bytes;
+use std::mem;
+
 use heed::Database;
+use heed::DatabaseStat;
 use heed::RoTxn;
+use heed::Unspecified;
 use serde::{Deserialize, Serialize};

+use crate::BEU32;
+
 #[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Default)]
 #[serde(rename_all = "camelCase")]
 /// The stats of a database.
@ -20,58 +25,24 @@ impl DatabaseStats {
    ///
    /// This function iterates over the whole database and computes the stats.
    /// It is not efficient and should be cached somewhere.
-    pub(crate) fn new(database: Database<Bytes, Bytes>, rtxn: &RoTxn<'_>) -> heed::Result<Self> {
-        let mut database_stats =
-            Self { number_of_entries: 0, total_key_size: 0, total_value_size: 0 };
+    pub(crate) fn new(
+        database: Database<BEU32, Unspecified>,
+        rtxn: &RoTxn<'_>,
+    ) -> heed::Result<Self> {
+        let DatabaseStat { page_size, depth: _, branch_pages, leaf_pages, overflow_pages, entries } =
+            database.stat(rtxn)?;

-        let mut iter = database.iter(rtxn)?;
-        while let Some((key, value)) = iter.next().transpose()? {
-            let key_size = key.len() as u64;
-            let value_size = value.len() as u64;
-            database_stats.total_key_size += key_size;
-            database_stats.total_value_size += value_size;
-        }
+        // We first take the total size without overflow pages as the overflow pages contains the values and only that.
+        let total_size = (branch_pages + leaf_pages + overflow_pages) * page_size as usize;
+        // We compute an estimated size for the keys.
+        let total_key_size = entries * (mem::size_of::<u32>() + 4);
+        let total_value_size = total_size - total_key_size;

-        database_stats.number_of_entries = database.len(rtxn)?;
-
-        Ok(database_stats)
-    }
-
-    /// Recomputes the stats of the database and returns the new stats.
-    ///
-    /// This function is used to update the stats of the database when some keys are modified.
-    /// It is more efficient than the `new` function because it does not iterate over the whole database but only the modified keys comparing the before and after states.
-    pub(crate) fn recompute<I, K>(
-        mut stats: Self,
-        database: Database<Bytes, Bytes>,
-        before_rtxn: &RoTxn<'_>,
-        after_rtxn: &RoTxn<'_>,
-        modified_keys: I,
-    ) -> heed::Result<Self>
-    where
-        I: IntoIterator<Item = K>,
-        K: AsRef<[u8]>,
-    {
-        for key in modified_keys {
-            let key = key.as_ref();
-            if let Some(value) = database.get(after_rtxn, key)? {
-                let key_size = key.len() as u64;
-                let value_size = value.len() as u64;
-                stats.total_key_size = stats.total_key_size.saturating_add(key_size);
-                stats.total_value_size = stats.total_value_size.saturating_add(value_size);
-            }
-
-            if let Some(value) = database.get(before_rtxn, key)? {
-                let key_size = key.len() as u64;
-                let value_size = value.len() as u64;
-                stats.total_key_size = stats.total_key_size.saturating_sub(key_size);
-                stats.total_value_size = stats.total_value_size.saturating_sub(value_size);
-            }
-        }
-
-        stats.number_of_entries = database.len(after_rtxn)?;
-
-        Ok(stats)
+        Ok(Self {
+            number_of_entries: entries as u64,
+            total_key_size: total_key_size as u64,
+            total_value_size: total_value_size as u64,
+        })
    }

    pub fn average_key_size(&self) -> u64 {
@ -86,6 +57,10 @@ impl DatabaseStats {
        self.number_of_entries
    }

+    pub fn total_size(&self) -> u64 {
+        self.total_key_size + self.total_value_size
+    }
+
    pub fn total_key_size(&self) -> u64 {
        self.total_key_size
    }
--- a/crates/milli/src/error.rs
+++ b/crates/milli/src/error.rs
@ -129,6 +129,14 @@ and can not be more than 511 bytes.", .document_id.to_string()
    InvalidGeoField(#[from] GeoError),
    #[error("Invalid vector dimensions: expected: `{}`, found: `{}`.", .expected, .found)]
    InvalidVectorDimensions { expected: usize, found: usize },
+    #[error("Invalid vector dimensions in document with id `{document_id}` in `._vectors.{embedder_name}`.\n  - note: embedding #{embedding_index} has dimensions {found}\n  - note: embedder `{embedder_name}` requires {expected}")]
+    InvalidIndexingVectorDimensions {
+        embedder_name: String,
+        document_id: String,
+        embedding_index: usize,
+        expected: usize,
+        found: usize,
+    },
    #[error("The `_vectors` field in the document with id: `{document_id}` is not an object. Was expecting an object with a key for each embedder with manually provided vectors, but instead got `{value}`")]
    InvalidVectorsMapType { document_id: String, value: Value },
    #[error("Bad embedder configuration in the document with id: `{document_id}`. {error}")]
--- a/crates/milli/src/index.rs
+++ b/crates/milli/src/index.rs
@ -3,8 +3,9 @@ use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
 use std::fs::File;
 use std::path::Path;

-use heed::{types::*, WithoutTls};
+use heed::{types::*, DatabaseStat, WithoutTls};
 use heed::{CompactionOption, Database, RoTxn, RwTxn, Unspecified};
+use indexmap::IndexMap;
 use roaring::RoaringBitmap;
 use rstar::RTree;
 use serde::{Deserialize, Serialize};
@ -410,38 +411,6 @@ impl Index {
        Ok(count.unwrap_or_default())
    }

-    /// Updates the stats of the documents database based on the previous stats and the modified docids.
-    pub fn update_documents_stats(
-        &self,
-        wtxn: &mut RwTxn<'_>,
-        modified_docids: roaring::RoaringBitmap,
-    ) -> Result<()> {
-        let before_rtxn = self.read_txn()?;
-        let document_stats = match self.documents_stats(&before_rtxn)? {
-            Some(before_stats) => DatabaseStats::recompute(
-                before_stats,
-                self.documents.remap_types(),
-                &before_rtxn,
-                wtxn,
-                modified_docids.iter().map(|docid| docid.to_be_bytes()),
-            )?,
-            None => {
-                // This should never happen when there are already documents in the index, the documents stats should be present.
-                // If it happens, it means that the index was not properly initialized/upgraded.
-                debug_assert_eq!(
-                    self.documents.len(&before_rtxn)?,
-                    0,
-                    "The documents stats should be present when there are documents in the index"
-                );
-                tracing::warn!("No documents stats found, creating new ones");
-                DatabaseStats::new(self.documents.remap_types(), &*wtxn)?
-            }
-        };
-
-        self.put_documents_stats(wtxn, document_stats)?;
-        Ok(())
-    }
-
    /// Writes the stats of the documents database.
    pub fn put_documents_stats(
        &self,
@ -1768,6 +1737,109 @@ impl Index {
        Ok(self.word_docids.remap_data_type::<DecodeIgnore>().get(rtxn, word)?.is_some()
            || self.exact_word_docids.remap_data_type::<DecodeIgnore>().get(rtxn, word)?.is_some())
    }
+
+    /// Returns the sizes in bytes of each of the index database at the given rtxn.
+    pub fn database_sizes(&self, rtxn: &RoTxn<'_>) -> heed::Result<IndexMap<&'static str, usize>> {
+        let Self {
+            env: _,
+            main,
+            external_documents_ids,
+            word_docids,
+            exact_word_docids,
+            word_prefix_docids,
+            exact_word_prefix_docids,
+            word_pair_proximity_docids,
+            word_position_docids,
+            word_fid_docids,
+            word_prefix_position_docids,
+            word_prefix_fid_docids,
+            field_id_word_count_docids,
+            facet_id_f64_docids,
+            facet_id_string_docids,
+            facet_id_normalized_string_strings,
+            facet_id_string_fst,
+            facet_id_exists_docids,
+            facet_id_is_null_docids,
+            facet_id_is_empty_docids,
+            field_id_docid_facet_f64s,
+            field_id_docid_facet_strings,
+            vector_arroy,
+            embedder_category_id,
+            documents,
+        } = self;
+
+        fn compute_size(stats: DatabaseStat) -> usize {
+            let DatabaseStat {
+                page_size,
+                depth: _,
+                branch_pages,
+                leaf_pages,
+                overflow_pages,
+                entries: _,
+            } = stats;
+
+            (branch_pages + leaf_pages + overflow_pages) * page_size as usize
+        }
+
+        let mut sizes = IndexMap::new();
+        sizes.insert("main", main.stat(rtxn).map(compute_size)?);
+        sizes
+            .insert("external_documents_ids", external_documents_ids.stat(rtxn).map(compute_size)?);
+        sizes.insert("word_docids", word_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert("exact_word_docids", exact_word_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert("word_prefix_docids", word_prefix_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert(
+            "exact_word_prefix_docids",
+            exact_word_prefix_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert(
+            "word_pair_proximity_docids",
+            word_pair_proximity_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert("word_position_docids", word_position_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert("word_fid_docids", word_fid_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert(
+            "word_prefix_position_docids",
+            word_prefix_position_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes
+            .insert("word_prefix_fid_docids", word_prefix_fid_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert(
+            "field_id_word_count_docids",
+            field_id_word_count_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert("facet_id_f64_docids", facet_id_f64_docids.stat(rtxn).map(compute_size)?);
+        sizes
+            .insert("facet_id_string_docids", facet_id_string_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert(
+            "facet_id_normalized_string_strings",
+            facet_id_normalized_string_strings.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert("facet_id_string_fst", facet_id_string_fst.stat(rtxn).map(compute_size)?);
+        sizes
+            .insert("facet_id_exists_docids", facet_id_exists_docids.stat(rtxn).map(compute_size)?);
+        sizes.insert(
+            "facet_id_is_null_docids",
+            facet_id_is_null_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert(
+            "facet_id_is_empty_docids",
+            facet_id_is_empty_docids.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert(
+            "field_id_docid_facet_f64s",
+            field_id_docid_facet_f64s.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert(
+            "field_id_docid_facet_strings",
+            field_id_docid_facet_strings.stat(rtxn).map(compute_size)?,
+        );
+        sizes.insert("vector_arroy", vector_arroy.stat(rtxn).map(compute_size)?);
+        sizes.insert("embedder_category_id", embedder_category_id.stat(rtxn).map(compute_size)?);
+        sizes.insert("documents", documents.stat(rtxn).map(compute_size)?);
+
+        Ok(sizes)
+    }
 }

 #[derive(Debug, Deserialize, Serialize)]
--- a/crates/milli/src/search/new/bucket_sort.rs
+++ b/crates/milli/src/search/new/bucket_sort.rs
@ -173,16 +173,18 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
                ranking_rule_scores.push(ScoreDetails::Skipped);

                // remove candidates from the universe without adding them to result if their score is below the threshold
-                if let Some(ranking_score_threshold) = ranking_score_threshold {
-                    let current_score = ScoreDetails::global_score(ranking_rule_scores.iter());
-                    if current_score < ranking_score_threshold {
-                        all_candidates -= bucket | &ranking_rule_universes[cur_ranking_rule_index];
-                        back!();
-                        continue;
-                    }
-                }
+                let is_below_threshold =
+                    ranking_score_threshold.is_some_and(|ranking_score_threshold| {
+                        let current_score = ScoreDetails::global_score(ranking_rule_scores.iter());
+                        current_score < ranking_score_threshold
+                    });

-                maybe_add_to_results!(bucket);
+                if is_below_threshold {
+                    all_candidates -= &bucket;
+                    all_candidates -= &ranking_rule_universes[cur_ranking_rule_index];
+                } else {
+                    maybe_add_to_results!(bucket);
+                }

                ranking_rule_scores.pop();

@ -237,23 +239,24 @@ pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
        );

        // remove candidates from the universe without adding them to result if their score is below the threshold
-        if let Some(ranking_score_threshold) = ranking_score_threshold {
+        let is_below_threshold = ranking_score_threshold.is_some_and(|ranking_score_threshold| {
            let current_score = ScoreDetails::global_score(ranking_rule_scores.iter());
-            if current_score < ranking_score_threshold {
-                all_candidates -=
-                    next_bucket.candidates | &ranking_rule_universes[cur_ranking_rule_index];
-                back!();
-                continue;
-            }
-        }
+            current_score < ranking_score_threshold
+        });

        ranking_rule_universes[cur_ranking_rule_index] -= &next_bucket.candidates;

        if cur_ranking_rule_index == ranking_rules_len - 1
            || (scoring_strategy == ScoringStrategy::Skip && next_bucket.candidates.len() <= 1)
            || cur_offset + (next_bucket.candidates.len() as usize) < from
+            || is_below_threshold
        {
-            maybe_add_to_results!(next_bucket.candidates);
+            if is_below_threshold {
+                all_candidates -= &next_bucket.candidates;
+                all_candidates -= &ranking_rule_universes[cur_ranking_rule_index];
+            } else {
+                maybe_add_to_results!(next_bucket.candidates);
+            }
            ranking_rule_scores.pop();
            continue;
        }
--- a/crates/milli/src/update/index_documents/mod.rs
+++ b/crates/milli/src/update/index_documents/mod.rs
@ -28,6 +28,7 @@ pub use self::helpers::*;
 pub use self::transform::{Transform, TransformOutput};
 use super::facet::clear_facet_levels_based_on_settings_diff;
 use super::new::StdResult;
+use crate::database_stats::DatabaseStats;
 use crate::documents::{obkv_to_object, DocumentsBatchReader};
 use crate::error::{Error, InternalError};
 use crate::index::{PrefixSearch, PrefixSettings};
@ -476,7 +477,8 @@ where

        if !settings_diff.settings_update_only {
            // Update the stats of the documents database when there is a document update.
-            self.index.update_documents_stats(self.wtxn, modified_docids)?;
+            let stats = DatabaseStats::new(self.index.documents.remap_data_type(), self.wtxn)?;
+            self.index.put_documents_stats(self.wtxn, stats)?;
        }
        // We write the field distribution into the main database
        self.index.put_field_distribution(self.wtxn, &field_distribution)?;
--- a/crates/milli/src/update/new/extract/vectors/mod.rs
+++ b/crates/milli/src/update/new/extract/vectors/mod.rs
@ -121,6 +121,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
                            // do we have set embeddings?
                            if let Some(embeddings) = new_vectors.embeddings {
                                chunks.set_vectors(
+                                    update.external_document_id(),
                                    update.docid(),
                                    embeddings
                                        .into_vec(&context.doc_alloc, embedder_name)
@ -128,7 +129,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
                                            document_id: update.external_document_id().to_string(),
                                            error: error.to_string(),
                                        })?,
-                                );
+                                )?;
                            } else if new_vectors.regenerate {
                                let new_rendered = prompt.render_document(
                                    update.external_document_id(),
@ -209,6 +210,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
                            chunks.set_regenerate(insertion.docid(), new_vectors.regenerate);
                            if let Some(embeddings) = new_vectors.embeddings {
                                chunks.set_vectors(
+                                    insertion.external_document_id(),
                                    insertion.docid(),
                                    embeddings
                                        .into_vec(&context.doc_alloc, embedder_name)
@ -218,7 +220,7 @@ impl<'a, 'b, 'extractor> Extractor<'extractor> for EmbeddingExtractor<'a, 'b> {
                                                .to_string(),
                                            error: error.to_string(),
                                        })?,
-                                );
+                                )?;
                            } else if new_vectors.regenerate {
                                let rendered = prompt.render_document(
                                    insertion.external_document_id(),
@ -273,6 +275,7 @@ struct Chunks<'a, 'b, 'extractor> {
    embedder: &'a Embedder,
    embedder_id: u8,
    embedder_name: &'a str,
+    dimensions: usize,
    prompt: &'a Prompt,
    possible_embedding_mistakes: &'a PossibleEmbeddingMistakes,
    user_provided: &'a RefCell<EmbeddingExtractorData<'extractor>>,
@ -297,6 +300,7 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
        let capacity = embedder.prompt_count_in_chunk_hint() * embedder.chunk_count_hint();
        let texts = BVec::with_capacity_in(capacity, doc_alloc);
        let ids = BVec::with_capacity_in(capacity, doc_alloc);
+        let dimensions = embedder.dimensions();
        Self {
            texts,
            ids,
@ -309,6 +313,7 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
            embedder_name,
            user_provided,
            has_manual_generation: None,
+            dimensions,
        }
    }

@ -490,7 +495,25 @@ impl<'a, 'b, 'extractor> Chunks<'a, 'b, 'extractor> {
        }
    }

-    fn set_vectors(&self, docid: DocumentId, embeddings: Vec<Embedding>) {
+    fn set_vectors(
+        &self,
+        external_docid: &'a str,
+        docid: DocumentId,
+        embeddings: Vec<Embedding>,
+    ) -> Result<()> {
+        for (embedding_index, embedding) in embeddings.iter().enumerate() {
+            if embedding.len() != self.dimensions {
+                return Err(UserError::InvalidIndexingVectorDimensions {
+                    expected: self.dimensions,
+                    found: embedding.len(),
+                    embedder_name: self.embedder_name.to_string(),
+                    document_id: external_docid.to_string(),
+                    embedding_index,
+                }
+                .into());
+            }
+        }
        self.sender.set_vectors(docid, self.embedder_id, embeddings).unwrap();
+        Ok(())
    }
 }
--- a/crates/milli/src/update/new/facet_search_builder.rs
+++ b/crates/milli/src/update/new/facet_search_builder.rs
@ -144,7 +144,7 @@ impl<'indexer> FacetSearchBuilder<'indexer> {
        let mut merger_iter = builder.build().into_stream_merger_iter()?;
        let mut current_field_id = None;
        let mut fst;
-        let mut fst_merger_builder: Option<FstMergerBuilder> = None;
+        let mut fst_merger_builder: Option<FstMergerBuilder<_>> = None;
        while let Some((key, deladd)) = merger_iter.next()? {
            let (field_id, normalized_facet_string) =
                BEU16StrCodec::bytes_decode(key).map_err(heed::Error::Encoding)?;
@ -153,12 +153,13 @@ impl<'indexer> FacetSearchBuilder<'indexer> {
                if let (Some(current_field_id), Some(fst_merger_builder)) =
                    (current_field_id, fst_merger_builder)
                {
-                    let mmap = fst_merger_builder.build(&mut callback)?;
-                    index.facet_id_string_fst.remap_data_type::<Bytes>().put(
-                        wtxn,
-                        &current_field_id,
-                        &mmap,
-                    )?;
+                    if let Some(mmap) = fst_merger_builder.build(&mut callback)? {
+                        index.facet_id_string_fst.remap_data_type::<Bytes>().put(
+                            wtxn,
+                            &current_field_id,
+                            &mmap,
+                        )?;
+                    }
                }

                fst = index.facet_id_string_fst.get(rtxn, &field_id)?;
@ -209,8 +210,9 @@ impl<'indexer> FacetSearchBuilder<'indexer> {
        }

        if let (Some(field_id), Some(fst_merger_builder)) = (current_field_id, fst_merger_builder) {
-            let mmap = fst_merger_builder.build(&mut callback)?;
-            index.facet_id_string_fst.remap_data_type::<Bytes>().put(wtxn, &field_id, &mmap)?;
+            if let Some(mmap) = fst_merger_builder.build(&mut callback)? {
+                index.facet_id_string_fst.remap_data_type::<Bytes>().put(wtxn, &field_id, &mmap)?;
+            }
        }

        Ok(())
--- a/crates/milli/src/update/new/fst_merger_builder.rs
+++ b/crates/milli/src/update/new/fst_merger_builder.rs
@ -1,25 +1,27 @@
 use std::fs::File;
 use std::io::BufWriter;

-use fst::{Set, SetBuilder, Streamer};
+use fst::{IntoStreamer, Set, SetBuilder, Streamer};
 use memmap2::Mmap;
 use tempfile::tempfile;

 use crate::update::del_add::DelAdd;
 use crate::{InternalError, Result};

-pub struct FstMergerBuilder<'a> {
+pub struct FstMergerBuilder<'a, D: AsRef<[u8]>> {
+    fst: Option<&'a Set<D>>,
    stream: Option<fst::set::Stream<'a>>,
-    fst_builder: SetBuilder<BufWriter<File>>,
+    fst_builder: Option<SetBuilder<BufWriter<File>>>,
    last: Option<Vec<u8>>,
    inserted_words: usize,
 }

-impl<'a> FstMergerBuilder<'a> {
-    pub fn new<D: AsRef<[u8]>>(fst: Option<&'a Set<D>>) -> Result<Self> {
+impl<'a, D: AsRef<[u8]>> FstMergerBuilder<'a, D> {
+    pub fn new(fst: Option<&'a Set<D>>) -> Result<Self> {
        Ok(Self {
+            fst,
            stream: fst.map(|fst| fst.stream()),
-            fst_builder: SetBuilder::new(BufWriter::new(tempfile()?))?,
+            fst_builder: None,
            last: None,
            inserted_words: 0,
        })
@ -110,11 +112,17 @@ impl<'a> FstMergerBuilder<'a> {
        is_modified: bool,
        insertion_callback: &mut impl FnMut(&[u8], DelAdd, bool) -> Result<()>,
    ) -> Result<()> {
-        // Addition: We insert the word
-        // Deletion: We delete the word by not inserting it
-        if deladd == DelAdd::Addition {
-            self.inserted_words += 1;
-            self.fst_builder.insert(bytes)?;
+        if is_modified && self.fst_builder.is_none() {
+            self.build_new_fst(bytes)?;
+        }
+
+        if let Some(fst_builder) = self.fst_builder.as_mut() {
+            // Addition: We insert the word
+            // Deletion: We delete the word by not inserting it
+            if deladd == DelAdd::Addition {
+                self.inserted_words += 1;
+                fst_builder.insert(bytes)?;
+            }
        }

        insertion_callback(bytes, deladd, is_modified)?;
@ -122,6 +130,19 @@ impl<'a> FstMergerBuilder<'a> {
        Ok(())
    }

+    // Lazily build the new fst
+    fn build_new_fst(&mut self, bytes: &[u8]) -> Result<()> {
+        let mut fst_builder = SetBuilder::new(BufWriter::new(tempfile()?))?;
+
+        if let Some(fst) = self.fst {
+            fst_builder.extend_stream(fst.range().lt(bytes).into_stream())?;
+        }
+
+        self.fst_builder = Some(fst_builder);
+
+        Ok(())
+    }
+
    fn drain_stream(
        &mut self,
        insertion_callback: &mut impl FnMut(&[u8], DelAdd, bool) -> Result<()>,
@ -142,16 +163,20 @@ impl<'a> FstMergerBuilder<'a> {
    pub fn build(
        mut self,
        insertion_callback: &mut impl FnMut(&[u8], DelAdd, bool) -> Result<()>,
-    ) -> Result<Mmap> {
+    ) -> Result<Option<Mmap>> {
        self.drain_stream(insertion_callback)?;

-        let fst_file = self
-            .fst_builder
-            .into_inner()?
-            .into_inner()
-            .map_err(|_| InternalError::IndexingMergingKeys { process: "building-fst" })?;
-        let fst_mmap = unsafe { Mmap::map(&fst_file)? };
+        match self.fst_builder {
+            Some(fst_builder) => {
+                let fst_file = fst_builder
+                    .into_inner()?
+                    .into_inner()
+                    .map_err(|_| InternalError::IndexingMergingKeys { process: "building-fst" })?;
+                let fst_mmap = unsafe { Mmap::map(&fst_file)? };

-        Ok(fst_mmap)
+                Ok(Some(fst_mmap))
+            }
+            None => Ok(None),
+        }
    }
 }
--- a/crates/milli/src/update/new/indexer/mod.rs
+++ b/crates/milli/src/update/new/indexer/mod.rs
@ -234,7 +234,6 @@ where
        embedders,
        field_distribution,
        document_ids,
-        modified_docids,
    )?;

    Ok(congestion)
--- a/crates/milli/src/update/new/indexer/post_processing.rs
+++ b/crates/milli/src/update/new/indexer/post_processing.rs
@ -7,12 +7,13 @@ use itertools::{merge_join_by, EitherOrBoth};
 use super::document_changes::IndexingContext;
 use crate::facet::FacetType;
 use crate::index::main_key::{WORDS_FST_KEY, WORDS_PREFIXES_FST_KEY};
+use crate::progress::Progress;
 use crate::update::del_add::DelAdd;
 use crate::update::facet::new_incremental::FacetsUpdateIncremental;
 use crate::update::facet::{FACET_GROUP_SIZE, FACET_MAX_GROUP_SIZE, FACET_MIN_LEVEL_SIZE};
 use crate::update::new::facet_search_builder::FacetSearchBuilder;
 use crate::update::new::merger::FacetFieldIdDelta;
-use crate::update::new::steps::IndexingStep;
+use crate::update::new::steps::{IndexingStep, PostProcessingFacets, PostProcessingWords};
 use crate::update::new::word_fst_builder::{PrefixData, PrefixDelta, WordFstBuilder};
 use crate::update::new::words_prefix_docids::{
    compute_exact_word_prefix_docids, compute_word_prefix_docids, compute_word_prefix_fid_docids,
@ -33,11 +34,23 @@ where
 {
    let index = indexing_context.index;
    indexing_context.progress.update_progress(IndexingStep::PostProcessingFacets);
-    compute_facet_level_database(index, wtxn, facet_field_ids_delta, &mut global_fields_ids_map)?;
-    compute_facet_search_database(index, wtxn, global_fields_ids_map)?;
+    compute_facet_level_database(
+        index,
+        wtxn,
+        facet_field_ids_delta,
+        &mut global_fields_ids_map,
+        indexing_context.progress,
+    )?;
+    compute_facet_search_database(index, wtxn, global_fields_ids_map, indexing_context.progress)?;
    indexing_context.progress.update_progress(IndexingStep::PostProcessingWords);
-    if let Some(prefix_delta) = compute_word_fst(index, wtxn)? {
-        compute_prefix_database(index, wtxn, prefix_delta, indexing_context.grenad_parameters)?;
+    if let Some(prefix_delta) = compute_word_fst(index, wtxn, indexing_context.progress)? {
+        compute_prefix_database(
+            index,
+            wtxn,
+            prefix_delta,
+            indexing_context.grenad_parameters,
+            indexing_context.progress,
+        )?;
    };
    Ok(())
 }
@ -48,21 +61,32 @@ fn compute_prefix_database(
    wtxn: &mut RwTxn,
    prefix_delta: PrefixDelta,
    grenad_parameters: &GrenadParameters,
+    progress: &Progress,
 ) -> Result<()> {
    let PrefixDelta { modified, deleted } = prefix_delta;
-    // Compute word prefix docids
+
+    progress.update_progress(PostProcessingWords::WordPrefixDocids);
    compute_word_prefix_docids(wtxn, index, &modified, &deleted, grenad_parameters)?;
-    // Compute exact word prefix docids
+
+    progress.update_progress(PostProcessingWords::ExactWordPrefixDocids);
    compute_exact_word_prefix_docids(wtxn, index, &modified, &deleted, grenad_parameters)?;
-    // Compute word prefix fid docids
+
+    progress.update_progress(PostProcessingWords::WordPrefixFieldIdDocids);
    compute_word_prefix_fid_docids(wtxn, index, &modified, &deleted, grenad_parameters)?;
-    // Compute word prefix position docids
+
+    progress.update_progress(PostProcessingWords::WordPrefixPositionDocids);
    compute_word_prefix_position_docids(wtxn, index, &modified, &deleted, grenad_parameters)
 }

 #[tracing::instrument(level = "trace", skip_all, target = "indexing")]
-fn compute_word_fst(index: &Index, wtxn: &mut RwTxn) -> Result<Option<PrefixDelta>> {
+fn compute_word_fst(
+    index: &Index,
+    wtxn: &mut RwTxn,
+    progress: &Progress,
+) -> Result<Option<PrefixDelta>> {
    let rtxn = index.read_txn()?;
+    progress.update_progress(PostProcessingWords::WordFst);
+
    let words_fst = index.words_fst(&rtxn)?;
    let mut word_fst_builder = WordFstBuilder::new(&words_fst)?;
    let prefix_settings = index.prefix_settings(&rtxn)?;
@ -94,7 +118,9 @@ fn compute_word_fst(index: &Index, wtxn: &mut RwTxn) -> Result<Option<PrefixDelt
    }

    let (word_fst_mmap, prefix_data) = word_fst_builder.build(index, &rtxn)?;
-    index.main.remap_types::<Str, Bytes>().put(wtxn, WORDS_FST_KEY, &word_fst_mmap)?;
+    if let Some(word_fst_mmap) = word_fst_mmap {
+        index.main.remap_types::<Str, Bytes>().put(wtxn, WORDS_FST_KEY, &word_fst_mmap)?;
+    }
    if let Some(PrefixData { prefixes_fst_mmap, prefix_delta }) = prefix_data {
        index.main.remap_types::<Str, Bytes>().put(
            wtxn,
@ -112,8 +138,10 @@ fn compute_facet_search_database(
    index: &Index,
    wtxn: &mut RwTxn,
    global_fields_ids_map: GlobalFieldsIdsMap,
+    progress: &Progress,
 ) -> Result<()> {
    let rtxn = index.read_txn()?;
+    progress.update_progress(PostProcessingFacets::FacetSearch);

    // if the facet search is not enabled, we can skip the rest of the function
    if !index.facet_search(wtxn)? {
@ -171,10 +199,16 @@ fn compute_facet_level_database(
    wtxn: &mut RwTxn,
    mut facet_field_ids_delta: FacetFieldIdsDelta,
    global_fields_ids_map: &mut GlobalFieldsIdsMap,
+    progress: &Progress,
 ) -> Result<()> {
    let rtxn = index.read_txn()?;
+
    let filterable_attributes_rules = index.filterable_attributes_rules(&rtxn)?;
-    for (fid, delta) in facet_field_ids_delta.consume_facet_string_delta() {
+    let mut deltas: Vec<_> = facet_field_ids_delta.consume_facet_string_delta().collect();
+    // We move all bulks at the front and incrementals (others) at the end.
+    deltas.sort_by_key(|(_, delta)| if let FacetFieldIdDelta::Bulk = delta { 0 } else { 1 });
+
+    for (fid, delta) in deltas {
        // skip field ids that should not be facet leveled
        let Some(metadata) = global_fields_ids_map.metadata(fid) else {
            continue;
@ -187,11 +221,13 @@ fn compute_facet_level_database(
        let _entered = span.enter();
        match delta {
            FacetFieldIdDelta::Bulk => {
+                progress.update_progress(PostProcessingFacets::StringsBulk);
                tracing::debug!(%fid, "bulk string facet processing");
                FacetsUpdateBulk::new_not_updating_level_0(index, vec![fid], FacetType::String)
                    .execute(wtxn)?
            }
            FacetFieldIdDelta::Incremental(delta_data) => {
+                progress.update_progress(PostProcessingFacets::StringsIncremental);
                tracing::debug!(%fid, len=%delta_data.len(), "incremental string facet processing");
                FacetsUpdateIncremental::new(
                    index,
@ -207,16 +243,22 @@ fn compute_facet_level_database(
        }
    }

-    for (fid, delta) in facet_field_ids_delta.consume_facet_number_delta() {
+    let mut deltas: Vec<_> = facet_field_ids_delta.consume_facet_number_delta().collect();
+    // We move all bulks at the front and incrementals (others) at the end.
+    deltas.sort_by_key(|(_, delta)| if let FacetFieldIdDelta::Bulk = delta { 0 } else { 1 });
+
+    for (fid, delta) in deltas {
        let span = tracing::trace_span!(target: "indexing::facet_field_ids", "number");
        let _entered = span.enter();
        match delta {
            FacetFieldIdDelta::Bulk => {
+                progress.update_progress(PostProcessingFacets::NumbersBulk);
                tracing::debug!(%fid, "bulk number facet processing");
                FacetsUpdateBulk::new_not_updating_level_0(index, vec![fid], FacetType::Number)
                    .execute(wtxn)?
            }
            FacetFieldIdDelta::Incremental(delta_data) => {
+                progress.update_progress(PostProcessingFacets::NumbersIncremental);
                tracing::debug!(%fid, len=%delta_data.len(), "incremental number facet processing");
                FacetsUpdateIncremental::new(
                    index,
--- a/crates/milli/src/update/new/indexer/write.rs
+++ b/crates/milli/src/update/new/indexer/write.rs
@ -7,6 +7,7 @@ use rand::SeedableRng as _;
 use time::OffsetDateTime;

 use super::super::channel::*;
+use crate::database_stats::DatabaseStats;
 use crate::documents::PrimaryKey;
 use crate::fields_ids_map::metadata::FieldIdMapWithMetadata;
 use crate::index::IndexEmbeddingConfig;
@ -142,7 +143,6 @@ pub(super) fn update_index(
    embedders: EmbeddingConfigs,
    field_distribution: std::collections::BTreeMap<String, u64>,
    document_ids: roaring::RoaringBitmap,
-    modified_docids: roaring::RoaringBitmap,
 ) -> Result<()> {
    index.put_fields_ids_map(wtxn, new_fields_ids_map.as_fields_ids_map())?;
    if let Some(new_primary_key) = new_primary_key {
@ -153,7 +153,8 @@ pub(super) fn update_index(
    index.put_field_distribution(wtxn, &field_distribution)?;
    index.put_documents_ids(wtxn, &document_ids)?;
    index.set_updated_at(wtxn, &OffsetDateTime::now_utc())?;
-    index.update_documents_stats(wtxn, modified_docids)?;
+    let stats = DatabaseStats::new(index.documents.remap_data_type(), wtxn)?;
+    index.put_documents_stats(wtxn, stats)?;
    Ok(())
 }

--- a/crates/milli/src/update/new/steps.rs
+++ b/crates/milli/src/update/new/steps.rs
@ -20,3 +20,23 @@ make_enum_progress! {
        Finalizing,
    }
 }
+
+make_enum_progress! {
+    pub enum PostProcessingFacets {
+        StringsBulk,
+        StringsIncremental,
+        NumbersBulk,
+        NumbersIncremental,
+        FacetSearch,
+    }
+}
+
+make_enum_progress! {
+    pub enum PostProcessingWords {
+        WordFst,
+        WordPrefixDocids,
+        ExactWordPrefixDocids,
+        WordPrefixFieldIdDocids,
+        WordPrefixPositionDocids,
+    }
+}
--- a/crates/milli/src/update/new/word_fst_builder.rs
+++ b/crates/milli/src/update/new/word_fst_builder.rs
@ -10,14 +10,14 @@ use crate::index::PrefixSettings;
 use crate::update::del_add::DelAdd;
 use crate::{InternalError, Prefix, Result};

-pub struct WordFstBuilder<'a> {
-    word_fst_builder: FstMergerBuilder<'a>,
+pub struct WordFstBuilder<'a, D: AsRef<[u8]>> {
+    word_fst_builder: FstMergerBuilder<'a, D>,
    prefix_fst_builder: Option<PrefixFstBuilder>,
    registered_words: usize,
 }

-impl<'a> WordFstBuilder<'a> {
-    pub fn new(words_fst: &'a Set<std::borrow::Cow<'a, [u8]>>) -> Result<Self> {
+impl<'a, D: AsRef<[u8]>> WordFstBuilder<'a, D> {
+    pub fn new(words_fst: &'a Set<D>) -> Result<Self> {
        Ok(Self {
            word_fst_builder: FstMergerBuilder::new(Some(words_fst))?,
            prefix_fst_builder: None,
@ -50,7 +50,7 @@ impl<'a> WordFstBuilder<'a> {
        mut self,
        index: &crate::Index,
        rtxn: &heed::RoTxn,
-    ) -> Result<(Mmap, Option<PrefixData>)> {
+    ) -> Result<(Option<Mmap>, Option<PrefixData>)> {
        let words_fst_mmap = self.word_fst_builder.build(&mut |bytes, deladd, is_modified| {
            if let Some(prefix_fst_builder) = &mut self.prefix_fst_builder {
                prefix_fst_builder.insert_word(bytes, deladd, is_modified)
--- a/crates/milli/src/update/settings.rs
+++ b/crates/milli/src/update/settings.rs
@ -1331,8 +1331,21 @@ impl InnerIndexSettingsDiff {

        let cache_exact_attributes = old_settings.exact_attributes != new_settings.exact_attributes;

-        let cache_user_defined_searchables = old_settings.user_defined_searchable_attributes
-            != new_settings.user_defined_searchable_attributes;
+        // Check if any searchable field has been added or removed form the list,
+        // Changing the order should not be considered as a change for reindexing.
+        let cache_user_defined_searchables = match (
+            &old_settings.user_defined_searchable_attributes,
+            &new_settings.user_defined_searchable_attributes,
+        ) {
+            (Some(old), Some(new)) => {
+                let old: BTreeSet<_> = old.iter().collect();
+                let new: BTreeSet<_> = new.iter().collect();
+
+                old != new
+            }
+            (None, None) => false,
+            _otherwise => true,
+        };

        // if the user-defined searchables changed, then we need to reindex prompts.
        if cache_user_defined_searchables {
Author	SHA1	Message	Date
ManyTheFish	6a5a834f27	Lazily compute the FSTs during indexing	2025-04-03 16:04:35 +02:00
Louis Dureuil	418fa47963	Merge pull request #5313 from barloes/fixRankingScoreThresholdRankingIssue fix for rankingScoreThreshold changes the results' ranking	2025-04-01 13:10:55 +00:00
Louis Dureuil	0656a0d515	Optimize roaring operation Co-authored-by: Many the fish <many@meilisearch.com>	2025-04-01 14:25:27 +02:00
Tamo	e36a8c50b9	Merge pull request #5478 from meilisearch/enforce-embedding-dimensions Enforce embedding dimensions	2025-03-31 15:31:29 +00:00
Louis Dureuil	08ff135ad6	Fix test	2025-03-31 15:27:49 +02:00
Louis Dureuil	f729864466	Check dimension mismatch at insertion time	2025-03-31 15:27:49 +02:00
Louis Dureuil	94ea263bef	Add new error for dimensions mismatch during indexing	2025-03-31 15:27:49 +02:00
Tamo	0e475cb5e6	fix warn and show what meilisearch understood of the vectors in the cursed test	2025-03-31 13:49:22 +02:00
vuthanhtung2412	62de70b73c	Document problematic case in test and acknowledge PR comment	2025-03-31 13:49:22 +02:00
vuthanhtung2412	7707fb18dd	add embedding with dimension mismatch test case	2025-03-31 13:49:22 +02:00
Clément Renault	bb2e9419d3	Merge pull request #5468 from meilisearch/more-precise-post-processing More Precise Post Processing	2025-03-27 10:07:09 +00:00
Clément Renault	cf68713145	Merge pull request #5465 from meilisearch/improve-stats-perf Improve documents stats performances	2025-03-27 09:20:14 +00:00
Kerollmops	811143cbe9	Add more progress precision when doing post processing	2025-03-27 10:17:28 +01:00
Kerollmops	c670e9a39b	Make sure the snaps are happy	2025-03-26 20:03:35 +01:00
Clément Renault	65f1b13475	Merge pull request #5464 from meilisearch/camel-case-database-sizes Prefer camelCase for internal database sizes db name	2025-03-26 16:40:39 +00:00
Kerollmops	db7ce03763	Improve the performances of computing the size of the documents database	2025-03-26 17:40:12 +01:00
Kerollmops	7ed9adde29	Prefer camelCase for internal database sizes db name	2025-03-26 16:45:52 +01:00
Clément Renault	9ce7ccfbe7	Merge pull request #5457 from meilisearch/show-database-sizes-changes Show database sizes batches	2025-03-26 10:19:40 +00:00
Kerollmops	3deb1ef78f	Fix the snapshots again	2025-03-26 10:38:49 +01:00
Kerollmops	5820d822c8	Add more details about the finalizing progress step	2025-03-26 09:49:43 +01:00
Kerollmops	637bea0370	Compute and store the database sizes	2025-03-26 09:49:42 +01:00
Kerollmops	fd079c6757	Add an index method to get the database sizes	2025-03-25 16:30:51 +01:00
Kerollmops	182e5d5632	Add database sizes stats to the batches	2025-03-25 16:30:15 +01:00
Many the fish	82aee6a9af	Merge pull request #5415 from meilisearch/isolate-word-fst-usage Isolate word fst usage	2025-03-25 11:43:37 +00:00
Clément Renault	fca947219f	Merge pull request #5402 from meilisearch/do-not-reindex-searchable-order-change Avoid reindexing searchable order changes	2025-03-25 07:03:14 +00:00
Clément Renault	fb7ae9f97f	Merge pull request #5454 from meilisearch/update-charabia-v0.9.3 Update Charabia v0.9.3	2025-03-24 22:34:51 +00:00
Clément Renault	cd421fea1e	Merge pull request #5456 from meilisearch/fix-CI Fix CI to work with merge queues	2025-03-25 09:55:59 +00:00
Kerollmops	1ad4235beb	Remove the bors file	2025-03-25 10:05:41 +01:00
Kerollmops	de6c7e551e	Remove bors references from the repository	2025-03-25 10:04:38 +01:00
Kerollmops	c0fe70c5f0	Make the CI work with merge queue grouping	2025-03-25 10:04:24 +01:00
Many the fish	a09d08c7b6	Avoid reindexing searchable order changes Update settings.rs Update settings.rs	2025-03-24 16:26:52 +01:00
ManyTheFish	2e6aa63efc	Update Charabia v0.9.3	2025-03-24 14:32:21 +01:00
Louis Dureuil	f9807ba32e	Fix logic when results are below the threshold	2025-03-19 11:34:53 +01:00
Tee Jun hui	8c8cc59a6c	remove new line added by accident	2025-03-19 11:34:53 +01:00
Tee Jun hui	f540a69ac3	add 1 to index so it points to correct position	2025-03-19 11:34:52 +01:00
meili-bors[bot]	7df2bdfb15	Merge #5436 5436: Update mini-dashboard to v0.2.19 version r=Kerollmops a=curquiza Fixes mini dashboard to prevent the panel from popping up every time Fixed by `@mdubus` 👍 Co-authored-by: curquiza <clementine@meilisearch.com>	2025-03-18 16:24:31 +00:00
curquiza	71f7456748	Update mini-dashboard to v0.2.19 version	2025-03-18 12:48:38 +01:00
meili-bors[bot]	c98b313d03	Merge #5426 5426: Bump zip from 2.2.2 to 2.3.0 r=Kerollmops a=dependabot[bot] Bumps [zip](https://github.com/zip-rs/zip2) from 2.2.2 to 2.3.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/zip-rs/zip2/releases">zip's releases</a>.</em></p> <blockquote> <h2>v2.3.0</h2> <h3><!-- raw HTML omitted -->🚀 Features</h3> <ul> <li>Add support for NTFS extra field (<a href="https://redirect.github.com/zip-rs/zip2/pull/279">#279</a>)</li> </ul> <h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3> <ul> <li><em>(test)</em> Conditionalize a zip64 doctest (<a href="https://redirect.github.com/zip-rs/zip2/pull/308">#308</a>)</li> <li>fix failing tests, remove symlink loop check</li> <li>Canonicalize output path to avoid false negatives</li> <li>Symlink handling in stream extraction</li> <li>Canonicalize output paths and symlink targets, and ensure they descend from the destination</li> </ul> <h3><!-- raw HTML omitted -->⚙️ Miscellaneous Tasks</h3> <ul> <li>Fix clippy and cargo fmt warnings (<a href="https://redirect.github.com/zip-rs/zip2/pull/310">#310</a>)</li> </ul> <h2>v2.2.3</h2> <h3><!-- raw HTML omitted -->🚜 Refactor</h3> <ul> <li>Change the inner structure of <code>DateTime</code> (<a href="https://redirect.github.com/zip-rs/zip2/issues/267">#267</a>)</li> </ul> <h3><!-- raw HTML omitted -->⚙️ Miscellaneous Tasks</h3> <ul> <li>cargo fix --edition</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/zip-rs/zip2/blob/master/CHANGELOG.md">zip's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/zip-rs/zip2/compare/v2.2.3...v2.3.0">2.3.0</a> - 2025-03-16</h2> <h3><!-- raw HTML omitted -->🚀 Features</h3> <ul> <li>Add support for NTFS extra field (<a href="https://redirect.github.com/zip-rs/zip2/pull/279">#279</a>)</li> </ul> <h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3> <ul> <li><em>(test)</em> Conditionalize a zip64 doctest (<a href="https://redirect.github.com/zip-rs/zip2/pull/308">#308</a>)</li> <li>fix failing tests, remove symlink loop check</li> <li>Canonicalize output path to avoid false negatives</li> <li>Symlink handling in stream extraction</li> <li>Canonicalize output paths and symlink targets, and ensure they descend from the destination</li> </ul> <h3><!-- raw HTML omitted -->⚙️ Miscellaneous Tasks</h3> <ul> <li>Fix clippy and cargo fmt warnings (<a href="https://redirect.github.com/zip-rs/zip2/pull/310">#310</a>)</li> </ul> <h2><a href="https://github.com/zip-rs/zip2/compare/v2.2.2...v2.2.3">2.2.3</a> - 2025-02-26</h2> <h3><!-- raw HTML omitted -->🚜 Refactor</h3> <ul> <li>Change the inner structure of <code>DateTime</code> (<a href="https://redirect.github.com/zip-rs/zip2/issues/267">#267</a>)</li> </ul> <h3><!-- raw HTML omitted -->⚙️ Miscellaneous Tasks</h3> <ul> <li>cargo fix --edition</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`6eab5f5cc6`"><code>6eab5f5</code></a> chore: release v2.3.0 (<a href="https://redirect.github.com/zip-rs/zip2/issues/300">#300</a>)</li> <li><a href="`e4aee2050f`"><code>e4aee20</code></a> implement <code>ZipFile::options</code> + refactor options normalization (<a href="https://redirect.github.com/zip-rs/zip2/issues/305">#305</a>)</li> <li><a href="`ea8a7bba24`"><code>ea8a7bb</code></a> fix(test): Conditionalize a zip64 doctest (<a href="https://redirect.github.com/zip-rs/zip2/issues/308">#308</a>)</li> <li><a href="`365c81a39f`"><code>365c81a</code></a> Use <code>xz2</code> crate instead of a custom implementation (<a href="https://redirect.github.com/zip-rs/zip2/issues/306">#306</a>)</li> <li><a href="`ae94b3452b`"><code>ae94b34</code></a> chore: Fix clippy and cargo fmt warnings (<a href="https://redirect.github.com/zip-rs/zip2/issues/310">#310</a>)</li> <li><a href="`a2e062f370`"><code>a2e062f</code></a> Merge commit from fork</li> <li><a href="`0199ac2cb8`"><code>0199ac2</code></a> Simplify handling for symlink targets</li> <li><a href="`977bb9479d`"><code>977bb94</code></a> fix failing tests, remove symlink loop check</li> <li><a href="`3cb29e70d1`"><code>3cb29e7</code></a> Partial fix for tests</li> <li><a href="`2182b07686`"><code>2182b07</code></a> Refactor</li> <li>Additional commits viewable in <a href="https://github.com/zip-rs/zip2/compare/v2.2.2...v2.3.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=zip&package-manager=cargo&previous-version=2.2.2&new-version=2.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/meilisearch/meilisearch/network/alerts). </details> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-03-18 08:57:11 +00:00
dependabot[bot]	69678ed8e1	Bump zip from 2.2.2 to 2.3.0 Bumps [zip](https://github.com/zip-rs/zip2) from 2.2.2 to 2.3.0. - [Release notes](https://github.com/zip-rs/zip2/releases) - [Changelog](https://github.com/zip-rs/zip2/blob/master/CHANGELOG.md) - [Commits](https://github.com/zip-rs/zip2/compare/v2.2.2...v2.3.0) --- updated-dependencies: - dependency-name: zip dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-03-18 00:19:49 +00:00
meili-bors[bot]	6ec1d2b712	Merge #5423 5423: Bump ring to v0.17.14 to compile on old aarch64 r=irevoire a=Kerollmops This PR will fix [this CI issue](https://github.com/meilisearch/meilisearch/actions/runs/13896085925/job/38876941154) where ring v0.17.13 breaks the compilation on old aarch64 machines by bumping its version to v0.17.14. Co-authored-by: Kerollmops <clement@meilisearch.com>	2025-03-17 12:53:02 +00:00
Kerollmops	49dd50dab2	Bump ring to v0.17.14 to compile on old aarch64	2025-03-17 11:29:17 +01:00