Send directly each chunk to the main thread instead of merging them at the end of the extracting

Remove append function
Compute chunk size based on the input data size ant the number of indexing threads
2025-12-01 10:15:50 +00:00 · 2024-01-22 16:30:27 +01:00 · 2024-01-22 16:30:09 +01:00 · 2024-01-22 16:29:44 +01:00 · 2024-01-15 18:41:14 +00:00 · 2024-01-15 17:54:50 +00:00
27 changed files with 443 additions and 577 deletions
--- a/.github/workflows/sdks-tests.yml
+++ b/.github/workflows/sdks-tests.yml
@@ -22,7 +22,7 @@ jobs:
    outputs:
      docker-image: ${{ steps.define-image.outputs.docker-image }}
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - name: Define the Docker image we need to use
        id: define-image
        run: |
@@ -46,11 +46,11 @@ jobs:
      MEILISEARCH_VERSION: ${{ needs.define-docker-image.outputs.docker-image }}

    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-dotnet
      - name: Setup .NET Core
-        uses: actions/setup-dotnet@v3
+        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: "6.0.x"
      - name: Install dependencies
@@ -75,12 +75,12 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-dart
      - uses: dart-lang/setup-dart@v1
        with:
-          sdk: 3.1.1
+          sdk: 'latest'
      - name: Install dependencies
        run: dart pub get
      - name: Run integration tests
@@ -100,10 +100,10 @@ jobs:
          - '7700:7700'
    steps:
      - name: Set up Go
-        uses: actions/setup-go@v4
+        uses: actions/setup-go@v5
        with:
          go-version: stable
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-go
      - name: Get dependencies
@@ -129,11 +129,11 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-java
      - name: Set up Java
-        uses: actions/setup-java@v3
+        uses: actions/setup-java@v4
        with:
          java-version: 8
          distribution: 'zulu'
@@ -156,7 +156,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-js
      - name: Setup node
@@ -191,7 +191,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-php
      - name: Install PHP
@@ -220,11 +220,11 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-python
      - name: Set up Python
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
      - name: Install pipenv
        uses: dschep/install-pipenv-action@v1
      - name: Install dependencies
@@ -245,7 +245,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-ruby
      - name: Set up Ruby 3
@@ -270,7 +270,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-rust
      - name: Build
@@ -291,7 +291,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-swift
      - name: Run tests
@@ -314,7 +314,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-js-plugins
      - name: Setup node
@@ -345,7 +345,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-rails
      - name: Set up Ruby 3
@@ -369,7 +369,7 @@ jobs:
        ports:
          - '7700:7700'
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
        with:
          repository: meilisearch/meilisearch-symfony
      - name: Install PHP
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2019-2022 Meili SAS
+Copyright (c) 2019-2024 Meili SAS

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ Meilisearch helps you shape a delightful search experience in a snap, offering f

 - **Search-as-you-type:** find search results in less than 50 milliseconds
 - **[Typo tolerance](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features#typo-tolerance):** get relevant matches even when queries contain typos and misspellings
- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
+- **[Filtering](https://www.meilisearch.com/docs/learn/fine_tuning_results/filtering?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features) and [faceted search](https://www.meilisearch.com/docs/learn/fine_tuning_results/faceted_search?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** enhance your users' search experience with custom filters and build a faceted search interface in a few lines of code
 - **[Sorting](https://www.meilisearch.com/docs/learn/fine_tuning_results/sorting?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** sort results based on price, date, or pretty much anything else your users need
 - **[Synonym support](https://www.meilisearch.com/docs/learn/getting_started/customizing_relevancy?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features#synonyms):** configure synonyms to include more relevant content in your search results
 - **[Geosearch](https://www.meilisearch.com/docs/learn/fine_tuning_results/geosearch?utm_campaign=oss&utm_source=github&utm_medium=meilisearch&utm_content=features):** filter and sort documents based on geographic data
--- a/index-scheduler/src/batch.rs
+++ b/index-scheduler/src/batch.rs
@@ -60,7 +60,7 @@ pub(crate) enum Batch {
        /// The list of tasks that were processing when this task cancelation appeared.
        previous_processing_tasks: RoaringBitmap,
    },
-    TaskDeletion(Task),
+    TaskDeletions(Vec<Task>),
    SnapshotCreation(Vec<Task>),
    Dump(Task),
    IndexOperation {
@@ -146,13 +146,12 @@ impl Batch {
    pub fn ids(&self) -> Vec<TaskId> {
        match self {
            Batch::TaskCancelation { task, .. }
-            | Batch::TaskDeletion(task)
            | Batch::Dump(task)
            | Batch::IndexCreation { task, .. }
            | Batch::IndexUpdate { task, .. } => vec![task.uid],
-            Batch::SnapshotCreation(tasks) | Batch::IndexDeletion { tasks, .. } => {
-                tasks.iter().map(|task| task.uid).collect()
-            }
+            Batch::SnapshotCreation(tasks)
+            | Batch::TaskDeletions(tasks)
+            | Batch::IndexDeletion { tasks, .. } => tasks.iter().map(|task| task.uid).collect(),
            Batch::IndexOperation { op, .. } => match op {
                IndexOperation::DocumentOperation { tasks, .. }
                | IndexOperation::Settings { tasks, .. }
@@ -180,7 +179,7 @@ impl Batch {
        use Batch::*;
        match self {
            TaskCancelation { .. }
-            | TaskDeletion(_)
+            | TaskDeletions(_)
            | SnapshotCreation(_)
            | Dump(_)
            | IndexSwap { .. } => None,
@@ -199,7 +198,7 @@ impl fmt::Display for Batch {
        let tasks = self.ids();
        match self {
            Batch::TaskCancelation { .. } => f.write_str("TaskCancelation")?,
-            Batch::TaskDeletion(_) => f.write_str("TaskDeletion")?,
+            Batch::TaskDeletions(_) => f.write_str("TaskDeletion")?,
            Batch::SnapshotCreation(_) => f.write_str("SnapshotCreation")?,
            Batch::Dump(_) => f.write_str("Dump")?,
            Batch::IndexOperation { op, .. } => write!(f, "{op}")?,
@@ -539,9 +538,9 @@ impl IndexScheduler {

        // 2. we get the next task to delete
        let to_delete = self.get_kind(rtxn, Kind::TaskDeletion)? & enqueued;
-        if let Some(task_id) = to_delete.min() {
-            let task = self.get_task(rtxn, task_id)?.ok_or(Error::CorruptedTaskQueue)?;
-            return Ok(Some(Batch::TaskDeletion(task)));
+        if !to_delete.is_empty() {
+            let tasks = self.get_existing_tasks(rtxn, to_delete)?;
+            return Ok(Some(Batch::TaskDeletions(tasks)));
        }

        // 3. we batch the snapshot.
@@ -681,31 +680,43 @@ impl IndexScheduler {

                Ok(vec![task])
            }
-            Batch::TaskDeletion(mut task) => {
+            Batch::TaskDeletions(mut tasks) => {
                // 1. Retrieve the tasks that matched the query at enqueue-time.
-                let matched_tasks =
+                let mut matched_tasks = RoaringBitmap::new();
+
+                for task in tasks.iter() {
                    if let KindWithContent::TaskDeletion { tasks, query: _ } = &task.kind {
-                        tasks
+                        matched_tasks |= tasks;
                    } else {
                        unreachable!()
+                    }
+                }
+
+                let mut wtxn = self.env.write_txn()?;
+                let mut deleted_tasks = self.delete_matched_tasks(&mut wtxn, &matched_tasks)?;
+                wtxn.commit()?;
+
+                for task in tasks.iter_mut() {
+                    task.status = Status::Succeeded;
+                    let KindWithContent::TaskDeletion { tasks, query: _ } = &task.kind else {
+                        unreachable!()
                    };

-                let mut wtxn = self.env.write_txn()?;
-                let deleted_tasks_count = self.delete_matched_tasks(&mut wtxn, matched_tasks)?;
+                    let deleted_tasks_count = deleted_tasks.intersection_len(tasks);
+                    deleted_tasks -= tasks;

-                task.status = Status::Succeeded;
-                match &mut task.details {
-                    Some(Details::TaskDeletion {
-                        matched_tasks: _,
-                        deleted_tasks,
-                        original_filter: _,
-                    }) => {
-                        *deleted_tasks = Some(deleted_tasks_count);
+                    match &mut task.details {
+                        Some(Details::TaskDeletion {
+                            matched_tasks: _,
+                            deleted_tasks,
+                            original_filter: _,
+                        }) => {
+                            *deleted_tasks = Some(deleted_tasks_count);
+                        }
+                        _ => unreachable!(),
                    }
-                    _ => unreachable!(),
                }
-                wtxn.commit()?;
-                Ok(vec![task])
+                Ok(tasks)
            }
            Batch::SnapshotCreation(mut tasks) => {
                fs::create_dir_all(&self.snapshots_path)?;
@@ -936,8 +947,8 @@ impl IndexScheduler {
                };

                // the index operation can take a long time, so save this handle to make it available to the search for the duration of the tick
-                *self.currently_updating_index.write().unwrap() =
-                    Some((index_uid.clone(), index.clone()));
+                self.index_mapper
+                    .set_currently_updating_index(Some((index_uid.clone(), index.clone())));

                let mut index_wtxn = index.write_txn()?;
                let tasks = self.apply_index_operation(&mut index_wtxn, &index, op)?;
@@ -1435,7 +1446,11 @@ impl IndexScheduler {
    /// Delete each given task from all the databases (if it is deleteable).
    ///
    /// Return the number of tasks that were actually deleted.
-    fn delete_matched_tasks(&self, wtxn: &mut RwTxn, matched_tasks: &RoaringBitmap) -> Result<u64> {
+    fn delete_matched_tasks(
+        &self,
+        wtxn: &mut RwTxn,
+        matched_tasks: &RoaringBitmap,
+    ) -> Result<RoaringBitmap> {
        // 1. Remove from this list the tasks that we are not allowed to delete
        let enqueued_tasks = self.get_status(wtxn, Status::Enqueued)?;
        let processing_tasks = &self.processing_tasks.read().unwrap().processing.clone();
@@ -1500,7 +1515,7 @@ impl IndexScheduler {
            }
        }

-        Ok(to_delete_tasks.len())
+        Ok(to_delete_tasks)
    }

    /// Cancel each given task from all the databases (if it is cancelable).
--- a/index-scheduler/src/index_mapper/mod.rs
+++ b/index-scheduler/src/index_mapper/mod.rs
@@ -69,6 +69,10 @@ pub struct IndexMapper {
    /// Whether we open a meilisearch index with the MDB_WRITEMAP option or not.
    enable_mdb_writemap: bool,
    pub indexer_config: Arc<IndexerConfig>,
+
+    /// A few types of long running batches of tasks that act on a single index set this field
+    /// so that a handle to the index is available from other threads (search) in an optimized manner.
+    currently_updating_index: Arc<RwLock<Option<(String, Index)>>>,
 }

 /// Whether the index is available for use or is forbidden to be inserted back in the index map
@@ -151,6 +155,7 @@ impl IndexMapper {
            index_growth_amount,
            enable_mdb_writemap,
            indexer_config: Arc::new(indexer_config),
+            currently_updating_index: Default::default(),
        })
    }

@@ -303,6 +308,14 @@ impl IndexMapper {

    /// Return an index, may open it if it wasn't already opened.
    pub fn index(&self, rtxn: &RoTxn, name: &str) -> Result<Index> {
+        if let Some((current_name, current_index)) =
+            self.currently_updating_index.read().unwrap().as_ref()
+        {
+            if current_name == name {
+                return Ok(current_index.clone());
+            }
+        }
+
        let uuid = self
            .index_mapping
            .get(rtxn, name)?
@@ -474,4 +487,8 @@ impl IndexMapper {
    pub fn indexer_config(&self) -> &IndexerConfig {
        &self.indexer_config
    }
+
+    pub fn set_currently_updating_index(&self, index: Option<(String, Index)>) {
+        *self.currently_updating_index.write().unwrap() = index;
+    }
 }
--- a/index-scheduler/src/insta_snapshot.rs
+++ b/index-scheduler/src/insta_snapshot.rs
@@ -42,7 +42,6 @@ pub fn snapshot_index_scheduler(scheduler: &IndexScheduler) -> String {
        test_breakpoint_sdr: _,
        planned_failures: _,
        run_loop_iteration: _,
-        currently_updating_index: _,
        embedders: _,
    } = scheduler;

--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
@@ -351,10 +351,6 @@ pub struct IndexScheduler {
    /// The path to the version file of Meilisearch.
    pub(crate) version_file_path: PathBuf,

-    /// A few types of long running batches of tasks that act on a single index set this field
-    /// so that a handle to the index is available from other threads (search) in an optimized manner.
-    currently_updating_index: Arc<RwLock<Option<(String, Index)>>>,
-
    embedders: Arc<RwLock<HashMap<EmbedderOptions, Arc<Embedder>>>>,

    // ================= test
@@ -403,7 +399,6 @@ impl IndexScheduler {
            version_file_path: self.version_file_path.clone(),
            webhook_url: self.webhook_url.clone(),
            webhook_authorization_header: self.webhook_authorization_header.clone(),
-            currently_updating_index: self.currently_updating_index.clone(),
            embedders: self.embedders.clone(),
            #[cfg(test)]
            test_breakpoint_sdr: self.test_breakpoint_sdr.clone(),
@@ -504,7 +499,6 @@ impl IndexScheduler {
            version_file_path: options.version_file_path,
            webhook_url: options.webhook_url,
            webhook_authorization_header: options.webhook_authorization_header,
-            currently_updating_index: Arc::new(RwLock::new(None)),
            embedders: Default::default(),

            #[cfg(test)]
@@ -688,13 +682,6 @@ impl IndexScheduler {
    /// If you need to fetch information from or perform an action on all indexes,
    /// see the `try_for_each_index` function.
    pub fn index(&self, name: &str) -> Result<Index> {
-        if let Some((current_name, current_index)) =
-            self.currently_updating_index.read().unwrap().as_ref()
-        {
-            if current_name == name {
-                return Ok(current_index.clone());
-            }
-        }
        let rtxn = self.env.read_txn()?;
        self.index_mapper.index(&rtxn, name)
    }
@@ -1175,7 +1162,7 @@ impl IndexScheduler {
        };

        // Reset the currently updating index to relinquish the index handle
-        *self.currently_updating_index.write().unwrap() = None;
+        self.index_mapper.set_currently_updating_index(None);

        #[cfg(test)]
        self.maybe_fail(tests::FailureLocation::AcquiringWtxn)?;
@@ -2257,10 +2244,7 @@ mod tests {
                .unwrap();
            index_scheduler.assert_internally_consistent();
        }
-        for _ in 0..2 {
-            handle.advance_one_successful_batch();
-            index_scheduler.assert_internally_consistent();
-        }
+        handle.advance_one_successful_batch();

        snapshot!(snapshot_index_scheduler(&index_scheduler), name: "task_deletion_processed");
    }
--- a/index-scheduler/src/snapshots/lib.rs/task_deletion_delete_same_task_twice/task_deletion_processed.snap
+++ b/index-scheduler/src/snapshots/lib.rs/task_deletion_delete_same_task_twice/task_deletion_processed.snap
@@ -34,12 +34,10 @@ catto: { number_of_documents: 1, field_distribution: {"id": 1} }
 [timestamp] [3,]
 ----------------------------------------------------------------------
 ### Started At:
-[timestamp] [2,]
-[timestamp] [3,]
+[timestamp] [2,3,]
 ----------------------------------------------------------------------
 ### Finished At:
-[timestamp] [2,]
-[timestamp] [3,]
+[timestamp] [2,3,]
 ----------------------------------------------------------------------
 ### File Store:
 00000000-0000-0000-0000-000000000001
--- a/meilisearch-types/src/settings.rs
+++ b/meilisearch-types/src/settings.rs
@@ -600,11 +600,12 @@ pub fn settings(
        ),
    };

-    let embedders = index
+    let embedders: BTreeMap<_, _> = index
        .embedding_configs(rtxn)?
        .into_iter()
        .map(|(name, config)| (name, Setting::Set(config.into())))
        .collect();
+    let embedders = if embedders.is_empty() { Setting::NotSet } else { Setting::Set(embedders) };

    Ok(Settings {
        displayed_attributes: match displayed_attributes {
@@ -631,7 +632,7 @@ pub fn settings(
        typo_tolerance: Setting::Set(typo_tolerance),
        faceting: Setting::Set(faceting),
        pagination: Setting::Set(pagination),
-        embedders: Setting::Set(embedders),
+        embedders,
        _kind: PhantomData,
    })
 }
--- a/meilisearch/src/routes/indexes/settings.rs
+++ b/meilisearch/src/routes/indexes/settings.rs
@@ -458,7 +458,7 @@ make_setting_route!(
            json!({
                "proximity_precision": {
                    "set": precision.is_some(),
-                    "value": precision,
+                    "value": precision.unwrap_or_default(),
                }
            }),
            Some(req),
@@ -690,7 +690,8 @@ pub async fn update_all(
                "set": new_settings.distinct_attribute.as_ref().set().is_some()
            },
            "proximity_precision": {
-                "set": new_settings.proximity_precision.as_ref().set().is_some()
+                "set": new_settings.proximity_precision.as_ref().set().is_some(),
+                "value": new_settings.proximity_precision.as_ref().set().copied().unwrap_or_default()
            },
            "typo_tolerance": {
                "enabled": new_settings.typo_tolerance
--- a/meilisearch/src/search.rs
+++ b/meilisearch/src/search.rs
@@ -735,6 +735,9 @@ pub fn perform_facet_search(
    if let Some(facet_query) = &facet_query {
        facet_search.query(facet_query);
    }
+    if let Some(max_facets) = index.max_values_per_facet(&rtxn)? {
+        facet_search.max_values(max_facets as usize);
+    }

    Ok(FacetSearchResult {
        facet_hits: facet_search.execute()?,
@@ -897,6 +900,14 @@ fn format_fields<'a>(
    let mut matches_position = compute_matches.then(BTreeMap::new);
    let mut document = document.clone();

+    // reduce the formatted option list to the attributes that should be formatted,
+    // instead of all the attributes to display.
+    let formatting_fields_options: Vec<_> = formatted_options
+        .iter()
+        .filter(|(_, option)| option.should_format())
+        .map(|(fid, option)| (field_ids_map.name(*fid).unwrap(), option))
+        .collect();
+
    // select the attributes to retrieve
    let displayable_names =
        displayable_ids.iter().map(|&fid| field_ids_map.name(fid).expect("Missing field name"));
@@ -905,13 +916,15 @@ fn format_fields<'a>(
        // to the value and merge them together. eg. If a user said he wanted to highlight `doggo`
        // and crop `doggo.name`. `doggo.name` needs to be highlighted + cropped while `doggo.age` is only
        // highlighted.
-        let format = formatted_options
+        // Warn: The time to compute the format list scales with the number of fields to format;
+        // cumulated with map_leaf_values that iterates over all the nested fields, it gives a quadratic complexity:
+        // d*f where d is the total number of fields to display and f is the total number of fields to format.
+        let format = formatting_fields_options
            .iter()
-            .filter(|(field, _option)| {
-                let name = field_ids_map.name(**field).unwrap();
+            .filter(|(name, _option)| {
                milli::is_faceted_by(name, key) || milli::is_faceted_by(key, name)
            })
-            .map(|(_, option)| *option)
+            .map(|(_, option)| **option)
            .reduce(|acc, option| acc.merge(option));
        let mut infos = Vec::new();

@@ -1008,7 +1021,7 @@ fn format_value<'a>(
                    let value = matcher.format(format_options);
                    Value::String(value.into_owned())
                }
-                None => Value::Number(number),
+                None => Value::String(s),
            }
        }
        value => value,
--- a/meilisearch/tests/dumps/mod.rs
+++ b/meilisearch/tests/dumps/mod.rs
@@ -77,8 +77,7 @@ async fn import_dump_v1_movie_raw() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -239,8 +238,7 @@ async fn import_dump_v1_movie_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -387,8 +385,7 @@ async fn import_dump_v1_rubygems_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -521,8 +518,7 @@ async fn import_dump_v2_movie_raw() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -667,8 +663,7 @@ async fn import_dump_v2_movie_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -812,8 +807,7 @@ async fn import_dump_v2_rubygems_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -946,8 +940,7 @@ async fn import_dump_v3_movie_raw() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1092,8 +1085,7 @@ async fn import_dump_v3_movie_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1237,8 +1229,7 @@ async fn import_dump_v3_rubygems_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1371,8 +1362,7 @@ async fn import_dump_v4_movie_raw() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1517,8 +1507,7 @@ async fn import_dump_v4_movie_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1662,8 +1651,7 @@ async fn import_dump_v4_rubygems_with_settings() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###
    );
@@ -1907,8 +1895,7 @@ async fn import_dump_v6_containing_experimental_features() {
      },
      "pagination": {
        "maxTotalHits": 1000
-      },
-      "embedders": {}
+      }
    }
    "###);

--- a/meilisearch/tests/search/facet_search.rs
+++ b/meilisearch/tests/search/facet_search.rs
@@ -105,6 +105,24 @@ async fn more_advanced_facet_search() {
    snapshot!(response["facetHits"].as_array().unwrap().len(), @"1");
 }

+#[actix_rt::test]
+async fn simple_facet_search_with_max_values() {
+    let server = Server::new().await;
+    let index = server.index("test");
+
+    let documents = DOCUMENTS.clone();
+    index.update_settings_faceting(json!({ "maxValuesPerFacet": 1 })).await;
+    index.update_settings_filterable_attributes(json!(["genres"])).await;
+    index.add_documents(documents, None).await;
+    index.wait_task(2).await;
+
+    let (response, code) =
+        index.facet_search(json!({"facetName": "genres", "facetQuery": "a"})).await;
+
+    assert_eq!(code, 200, "{}", response);
+    assert_eq!(dbg!(response)["facetHits"].as_array().unwrap().len(), 1);
+}
+
 #[actix_rt::test]
 async fn non_filterable_facet_search_error() {
    let server = Server::new().await;
--- a/meilisearch/tests/settings/get_settings.rs
+++ b/meilisearch/tests/settings/get_settings.rs
@@ -54,7 +54,7 @@ async fn get_settings() {
    let (response, code) = index.settings().await;
    assert_eq!(code, 200);
    let settings = response.as_object().unwrap();
-    assert_eq!(settings.keys().len(), 16);
+    assert_eq!(settings.keys().len(), 15);
    assert_eq!(settings["displayedAttributes"], json!(["*"]));
    assert_eq!(settings["searchableAttributes"], json!(["*"]));
    assert_eq!(settings["filterableAttributes"], json!([]));
@@ -83,7 +83,6 @@ async fn get_settings() {
            "maxTotalHits": 1000,
        })
    );
-    assert_eq!(settings["embedders"], json!({}));
    assert_eq!(settings["proximityPrecision"], json!("byWord"));
 }

--- a/milli/src/search/mod.rs
+++ b/milli/src/search/mod.rs
@@ -27,8 +27,8 @@ static LEVDIST0: Lazy<LevBuilder> = Lazy::new(|| LevBuilder::new(0, true));
 static LEVDIST1: Lazy<LevBuilder> = Lazy::new(|| LevBuilder::new(1, true));
 static LEVDIST2: Lazy<LevBuilder> = Lazy::new(|| LevBuilder::new(2, true));

-/// The maximum number of facets returned by the facet search route.
-const MAX_NUMBER_OF_FACETS: usize = 100;
+/// The maximum number of values per facet returned by the facet search route.
+const DEFAULT_MAX_NUMBER_OF_VALUES_PER_FACET: usize = 100;

 pub mod facet;
 mod fst_utils;
@@ -306,6 +306,7 @@ pub struct SearchForFacetValues<'a> {
    query: Option<String>,
    facet: String,
    search_query: Search<'a>,
+    max_values: usize,
    is_hybrid: bool,
 }

@@ -315,7 +316,13 @@ impl<'a> SearchForFacetValues<'a> {
        search_query: Search<'a>,
        is_hybrid: bool,
    ) -> SearchForFacetValues<'a> {
-        SearchForFacetValues { query: None, facet, search_query, is_hybrid }
+        SearchForFacetValues {
+            query: None,
+            facet,
+            search_query,
+            max_values: DEFAULT_MAX_NUMBER_OF_VALUES_PER_FACET,
+            is_hybrid,
+        }
    }

    pub fn query(&mut self, query: impl Into<String>) -> &mut Self {
@@ -323,6 +330,11 @@ impl<'a> SearchForFacetValues<'a> {
        self
    }

+    pub fn max_values(&mut self, max: usize) -> &mut Self {
+        self.max_values = max;
+        self
+    }
+
    fn one_original_value_of(
        &self,
        field_id: FieldId,
@@ -462,7 +474,7 @@ impl<'a> SearchForFacetValues<'a> {
                            .unwrap_or_else(|| left_bound.to_string());
                        results.push(FacetValueHit { value, count });
                    }
-                    if results.len() >= MAX_NUMBER_OF_FACETS {
+                    if results.len() >= self.max_values {
                        break;
                    }
                }
@@ -507,7 +519,7 @@ impl<'a> SearchForFacetValues<'a> {
                    .unwrap_or_else(|| query.to_string());
                results.push(FacetValueHit { value, count });
            }
-            if results.len() >= MAX_NUMBER_OF_FACETS {
+            if results.len() >= self.max_values {
                return Ok(ControlFlow::Break(()));
            }
        }
--- a/milli/src/search/new/bucket_sort.rs
+++ b/milli/src/search/new/bucket_sort.rs
@@ -15,6 +15,7 @@ pub struct BucketSortOutput {

 // TODO: would probably be good to regroup some of these inside of a struct?
 #[allow(clippy::too_many_arguments)]
+#[logging_timer::time]
 pub fn bucket_sort<'ctx, Q: RankingRuleQueryTrait>(
    ctx: &mut SearchContext<'ctx>,
    mut ranking_rules: Vec<BoxRankingRule<'ctx, Q>>,
--- a/milli/src/search/new/matches/mod.rs
+++ b/milli/src/search/new/matches/mod.rs
@@ -72,7 +72,7 @@ impl<'m> MatcherBuilder<'m> {
    }
 }

-#[derive(Copy, Clone, Default)]
+#[derive(Copy, Clone, Default, Debug)]
 pub struct FormatOptions {
    pub highlight: bool,
    pub crop: Option<usize>,
@@ -82,6 +82,10 @@ impl FormatOptions {
    pub fn merge(self, other: Self) -> Self {
        Self { highlight: self.highlight || other.highlight, crop: self.crop.or(other.crop) }
    }
+
+    pub fn should_format(&self) -> bool {
+        self.highlight || self.crop.is_some()
+    }
 }

 #[derive(Clone, Debug)]
--- a/milli/src/search/new/mod.rs
+++ b/milli/src/search/new/mod.rs
@@ -191,6 +191,7 @@ fn resolve_maximally_reduced_query_graph(
    Ok(docids)
 }

+#[logging_timer::time]
 fn resolve_universe(
    ctx: &mut SearchContext,
    initial_universe: &RoaringBitmap,
@@ -556,6 +557,7 @@ pub fn execute_vector_search(
 }

 #[allow(clippy::too_many_arguments)]
+#[logging_timer::time]
 pub fn execute_search(
    ctx: &mut SearchContext,
    query: Option<&str>,
--- a/milli/src/search/new/query_term/parse_query.rs
+++ b/milli/src/search/new/query_term/parse_query.rs
@@ -5,6 +5,7 @@ use super::*;
 use crate::{Result, SearchContext, MAX_WORD_LENGTH};

 /// Convert the tokenised search query into a list of located query terms.
+#[logging_timer::time]
 pub fn located_query_terms_from_tokens(
    ctx: &mut SearchContext,
    query: NormalizedTokenIter,
--- a/milli/src/update/index_documents/extract/extract_docid_word_positions.rs
+++ b/milli/src/update/index_documents/extract/extract_docid_word_positions.rs
@@ -26,7 +26,7 @@ pub fn extract_docid_word_positions<R: io::Read + io::Seek>(
    obkv_documents: grenad::Reader<R>,
    indexer: GrenadParameters,
    searchable_fields: &Option<HashSet<FieldId>>,
-    stop_words: Option<&fst::Set<&[u8]>>,
+    stop_words: Option<&fst::Set<Vec<u8>>>,
    allowed_separators: Option<&[&str]>,
    dictionary: Option<&[&str]>,
    max_positions_per_attributes: Option<u32>,
@@ -181,11 +181,11 @@ fn searchable_fields_changed(

 /// Factorize tokenizer building.
 fn tokenizer_builder<'a>(
-    stop_words: Option<&'a fst::Set<&[u8]>>,
+    stop_words: Option<&'a fst::Set<Vec<u8>>>,
    allowed_separators: Option<&'a [&str]>,
    dictionary: Option<&'a [&str]>,
    script_language: Option<&'a HashMap<Script, Vec<Language>>>,
-) -> TokenizerBuilder<'a, &'a [u8]> {
+) -> TokenizerBuilder<'a, Vec<u8>> {
    let mut tokenizer_builder = TokenizerBuilder::new();
    if let Some(stop_words) = stop_words {
        tokenizer_builder.stop_words(stop_words);
@@ -211,7 +211,7 @@ fn lang_safe_tokens_from_document<'a>(
    obkv: &KvReader<FieldId>,
    searchable_fields: &Option<HashSet<FieldId>>,
    tokenizer: &Tokenizer,
-    stop_words: Option<&fst::Set<&[u8]>>,
+    stop_words: Option<&fst::Set<Vec<u8>>>,
    allowed_separators: Option<&[&str]>,
    dictionary: Option<&[&str]>,
    max_positions_per_attributes: u32,
--- a/milli/src/update/index_documents/extract/mod.rs
+++ b/milli/src/update/index_documents/extract/mod.rs
@@ -14,7 +14,6 @@ use std::fs::File;
 use std::io::BufReader;

 use crossbeam_channel::Sender;
-use log::debug;
 use rayon::prelude::*;

 use self::extract_docid_word_positions::extract_docid_word_positions;
@@ -29,10 +28,7 @@ use self::extract_vector_points::{
 use self::extract_word_docids::extract_word_docids;
 use self::extract_word_pair_proximity_docids::extract_word_pair_proximity_docids;
 use self::extract_word_position_docids::extract_word_position_docids;
-use super::helpers::{
-    as_cloneable_grenad, merge_deladd_cbo_roaring_bitmaps, CursorClonableMmap, GrenadParameters,
-    MergeFn, MergeableReader,
-};
+use super::helpers::{as_cloneable_grenad, CursorClonableMmap, GrenadParameters};
 use super::{helpers, TypedChunk};
 use crate::proximity::ProximityPrecision;
 use crate::vector::EmbeddingConfigs;
@@ -51,7 +47,7 @@ pub(crate) fn data_from_obkv_documents(
    primary_key_id: FieldId,
    geo_fields_ids: Option<(FieldId, FieldId)>,
    field_id_map: FieldsIdsMap,
-    stop_words: Option<fst::Set<&[u8]>>,
+    stop_words: Option<fst::Set<Vec<u8>>>,
    allowed_separators: Option<&[&str]>,
    dictionary: Option<&[&str]>,
    max_positions_per_attributes: Option<u32>,
@@ -61,218 +57,170 @@ pub(crate) fn data_from_obkv_documents(
 ) -> Result<()> {
    puffin::profile_function!();

-    original_obkv_chunks
-        .par_bridge()
-        .map(|original_documents_chunk| {
-            send_original_documents_data(
-                original_documents_chunk,
-                indexer,
-                lmdb_writer_sx.clone(),
-                field_id_map.clone(),
-                embedders.clone(),
-            )
-        })
-        .collect::<Result<()>>()?;
-
-    #[allow(clippy::type_complexity)]
-    let result: Result<(Vec<_>, (Vec<_>, (Vec<_>, (Vec<_>, (Vec<_>, Vec<_>)))))> =
-        flattened_obkv_chunks
-            .par_bridge()
-            .map(|flattened_obkv_chunks| {
-                send_and_extract_flattened_documents_data(
-                    flattened_obkv_chunks,
-                    indexer,
-                    lmdb_writer_sx.clone(),
-                    &searchable_fields,
-                    &faceted_fields,
-                    primary_key_id,
-                    geo_fields_ids,
-                    &stop_words,
-                    &allowed_separators,
-                    &dictionary,
-                    max_positions_per_attributes,
-                )
-            })
-            .collect();
-
-    let (
-        docid_word_positions_chunks,
-        (
-            fid_docid_facet_numbers_chunks,
-            (
-                fid_docid_facet_strings_chunks,
-                (
-                    facet_is_null_docids_chunks,
-                    (facet_is_empty_docids_chunks, facet_exists_docids_chunks),
-                ),
-            ),
-        ),
-    ) = result?;
-
-    // merge facet_exists_docids and send them as a typed chunk
-    {
-        let lmdb_writer_sx = lmdb_writer_sx.clone();
-        rayon::spawn(move || {
-            debug!("merge {} database", "facet-id-exists-docids");
-            match facet_exists_docids_chunks.merge(merge_deladd_cbo_roaring_bitmaps, &indexer) {
-                Ok(reader) => {
-                    let _ = lmdb_writer_sx.send(Ok(TypedChunk::FieldIdFacetExistsDocids(reader)));
-                }
-                Err(e) => {
-                    let _ = lmdb_writer_sx.send(Err(e));
-                }
-            }
-        });
-    }
-
-    // merge facet_is_null_docids and send them as a typed chunk
-    {
-        let lmdb_writer_sx = lmdb_writer_sx.clone();
-        rayon::spawn(move || {
-            debug!("merge {} database", "facet-id-is-null-docids");
-            match facet_is_null_docids_chunks.merge(merge_deladd_cbo_roaring_bitmaps, &indexer) {
-                Ok(reader) => {
-                    let _ = lmdb_writer_sx.send(Ok(TypedChunk::FieldIdFacetIsNullDocids(reader)));
-                }
-                Err(e) => {
-                    let _ = lmdb_writer_sx.send(Err(e));
-                }
-            }
-        });
-    }
-
-    // merge facet_is_empty_docids and send them as a typed chunk
-    {
-        let lmdb_writer_sx = lmdb_writer_sx.clone();
-        rayon::spawn(move || {
-            debug!("merge {} database", "facet-id-is-empty-docids");
-            match facet_is_empty_docids_chunks.merge(merge_deladd_cbo_roaring_bitmaps, &indexer) {
-                Ok(reader) => {
-                    let _ = lmdb_writer_sx.send(Ok(TypedChunk::FieldIdFacetIsEmptyDocids(reader)));
-                }
-                Err(e) => {
-                    let _ = lmdb_writer_sx.send(Err(e));
-                }
-            }
-        });
-    }
-
-    if proximity_precision == ProximityPrecision::ByWord {
-        spawn_extraction_task::<_, _, Vec<grenad::Reader<BufReader<File>>>>(
-            docid_word_positions_chunks.clone(),
-            indexer,
-            lmdb_writer_sx.clone(),
-            extract_word_pair_proximity_docids,
-            merge_deladd_cbo_roaring_bitmaps,
-            TypedChunk::WordPairProximityDocids,
-            "word-pair-proximity-docids",
-        );
-    }
-
-    spawn_extraction_task::<_, _, Vec<grenad::Reader<BufReader<File>>>>(
-        docid_word_positions_chunks.clone(),
-        indexer,
-        lmdb_writer_sx.clone(),
-        extract_fid_word_count_docids,
-        merge_deladd_cbo_roaring_bitmaps,
-        TypedChunk::FieldIdWordCountDocids,
-        "field-id-wordcount-docids",
-    );
-
-    spawn_extraction_task::<
-        _,
-        _,
-        Vec<(
-            grenad::Reader<BufReader<File>>,
-            grenad::Reader<BufReader<File>>,
-            grenad::Reader<BufReader<File>>,
-        )>,
-    >(
-        docid_word_positions_chunks.clone(),
-        indexer,
-        lmdb_writer_sx.clone(),
-        move |doc_word_pos, indexer| extract_word_docids(doc_word_pos, indexer, &exact_attributes),
-        merge_deladd_cbo_roaring_bitmaps,
-        |(word_docids_reader, exact_word_docids_reader, word_fid_docids_reader)| {
-            TypedChunk::WordDocids {
-                word_docids_reader,
-                exact_word_docids_reader,
-                word_fid_docids_reader,
-            }
+    let (original_pipeline_result, flattened_pipeline_result): (Result<_>, Result<_>) = rayon::join(
+        || {
+            original_obkv_chunks
+                .par_bridge()
+                .map(|original_documents_chunk| {
+                    send_original_documents_data(
+                        original_documents_chunk,
+                        indexer,
+                        lmdb_writer_sx.clone(),
+                        field_id_map.clone(),
+                        embedders.clone(),
+                    )
+                })
+                .collect::<Result<()>>()
+        },
+        || {
+            flattened_obkv_chunks
+                .par_bridge()
+                .map(|flattened_obkv_chunks| {
+                    send_and_extract_flattened_documents_data(
+                        flattened_obkv_chunks,
+                        indexer,
+                        lmdb_writer_sx.clone(),
+                        &searchable_fields,
+                        &faceted_fields,
+                        primary_key_id,
+                        geo_fields_ids,
+                        &stop_words,
+                        &allowed_separators,
+                        &dictionary,
+                        max_positions_per_attributes,
+                    )
+                })
+                .inspect(|result| {
+                    if proximity_precision == ProximityPrecision::ByWord {
+                        if let Ok((docid_word_positions_chunk, _)) = result {
+                            run_extraction_task::<_, _, grenad::Reader<BufReader<File>>>(
+                                docid_word_positions_chunk.clone(),
+                                indexer,
+                                lmdb_writer_sx.clone(),
+                                extract_word_pair_proximity_docids,
+                                TypedChunk::WordPairProximityDocids,
+                                "word-pair-proximity-docids",
+                            );
+                        }
+                    }
+                })
+                .inspect(|result| {
+                    if let Ok((docid_word_positions_chunk, _)) = result {
+                        run_extraction_task::<_, _, grenad::Reader<BufReader<File>>>(
+                            docid_word_positions_chunk.clone(),
+                            indexer,
+                            lmdb_writer_sx.clone(),
+                            extract_fid_word_count_docids,
+                            TypedChunk::FieldIdWordCountDocids,
+                            "field-id-wordcount-docids",
+                        );
+                    }
+                })
+                .inspect(|result| {
+                    if let Ok((docid_word_positions_chunk, _)) = result {
+                        let exact_attributes = exact_attributes.clone();
+                        run_extraction_task::<
+                            _,
+                            _,
+                            (
+                                grenad::Reader<BufReader<File>>,
+                                grenad::Reader<BufReader<File>>,
+                                grenad::Reader<BufReader<File>>,
+                            ),
+                        >(
+                            docid_word_positions_chunk.clone(),
+                            indexer,
+                            lmdb_writer_sx.clone(),
+                            move |doc_word_pos, indexer| {
+                                extract_word_docids(doc_word_pos, indexer, &exact_attributes)
+                            },
+                            |(
+                                word_docids_reader,
+                                exact_word_docids_reader,
+                                word_fid_docids_reader,
+                            )| {
+                                TypedChunk::WordDocids {
+                                    word_docids_reader,
+                                    exact_word_docids_reader,
+                                    word_fid_docids_reader,
+                                }
+                            },
+                            "word-docids",
+                        );
+                    }
+                })
+                .inspect(|result| {
+                    if let Ok((docid_word_positions_chunk, _)) = result {
+                        run_extraction_task::<_, _, grenad::Reader<BufReader<File>>>(
+                            docid_word_positions_chunk.clone(),
+                            indexer,
+                            lmdb_writer_sx.clone(),
+                            extract_word_position_docids,
+                            TypedChunk::WordPositionDocids,
+                            "word-position-docids",
+                        );
+                    }
+                })
+                .inspect(|result| {
+                    if let Ok((_, (_, fid_docid_facet_strings_chunk))) = result {
+                        run_extraction_task::<_, _, grenad::Reader<BufReader<File>>>(
+                            fid_docid_facet_strings_chunk.clone(),
+                            indexer,
+                            lmdb_writer_sx.clone(),
+                            extract_facet_string_docids,
+                            TypedChunk::FieldIdFacetStringDocids,
+                            "field-id-facet-string-docids",
+                        );
+                    }
+                })
+                .inspect(|result| {
+                    if let Ok((_, (fid_docid_facet_numbers_chunk, _))) = result {
+                        run_extraction_task::<_, _, grenad::Reader<BufReader<File>>>(
+                            fid_docid_facet_numbers_chunk.clone(),
+                            indexer,
+                            lmdb_writer_sx.clone(),
+                            extract_facet_number_docids,
+                            TypedChunk::FieldIdFacetNumberDocids,
+                            "field-id-facet-number-docids",
+                        );
+                    }
+                })
+                .map(|r| r.map(|_| ()))
+                .collect::<Result<()>>()
        },
-        "word-docids",
    );

-    spawn_extraction_task::<_, _, Vec<grenad::Reader<BufReader<File>>>>(
-        docid_word_positions_chunks.clone(),
-        indexer,
-        lmdb_writer_sx.clone(),
-        extract_word_position_docids,
-        merge_deladd_cbo_roaring_bitmaps,
-        TypedChunk::WordPositionDocids,
-        "word-position-docids",
-    );
-
-    spawn_extraction_task::<_, _, Vec<grenad::Reader<BufReader<File>>>>(
-        fid_docid_facet_strings_chunks,
-        indexer,
-        lmdb_writer_sx.clone(),
-        extract_facet_string_docids,
-        merge_deladd_cbo_roaring_bitmaps,
-        TypedChunk::FieldIdFacetStringDocids,
-        "field-id-facet-string-docids",
-    );
-
-    spawn_extraction_task::<_, _, Vec<grenad::Reader<BufReader<File>>>>(
-        fid_docid_facet_numbers_chunks,
-        indexer,
-        lmdb_writer_sx,
-        extract_facet_number_docids,
-        merge_deladd_cbo_roaring_bitmaps,
-        TypedChunk::FieldIdFacetNumberDocids,
-        "field-id-facet-number-docids",
-    );
-
-    Ok(())
+    original_pipeline_result.and(flattened_pipeline_result)
 }

 /// Spawn a new task to extract data for a specific DB using extract_fn.
 /// Generated grenad chunks are merged using the merge_fn.
 /// The result of merged chunks is serialized as TypedChunk using the serialize_fn
 /// and sent into lmdb_writer_sx.
-fn spawn_extraction_task<FE, FS, M>(
-    chunks: Vec<grenad::Reader<CursorClonableMmap>>,
+fn run_extraction_task<FE, FS, M>(
+    chunk: grenad::Reader<CursorClonableMmap>,
    indexer: GrenadParameters,
    lmdb_writer_sx: Sender<Result<TypedChunk>>,
    extract_fn: FE,
-    merge_fn: MergeFn,
    serialize_fn: FS,
    name: &'static str,
 ) where
-    FE: Fn(grenad::Reader<CursorClonableMmap>, GrenadParameters) -> Result<M::Output>
+    FE: Fn(grenad::Reader<CursorClonableMmap>, GrenadParameters) -> Result<M>
        + Sync
        + Send
        + 'static,
-    FS: Fn(M::Output) -> TypedChunk + Sync + Send + 'static,
-    M: MergeableReader + FromParallelIterator<M::Output> + Send + 'static,
-    M::Output: Send,
+    FS: Fn(M) -> TypedChunk + Sync + Send + 'static,
+    M: Send,
 {
-    rayon::spawn(move || {
-        puffin::profile_scope!("extract_multiple_chunks", name);
-        let chunks: Result<M> =
-            chunks.into_par_iter().map(|chunk| extract_fn(chunk, indexer)).collect();
-        rayon::spawn(move || match chunks {
-            Ok(chunks) => {
-                debug!("merge {} database", name);
-                puffin::profile_scope!("merge_multiple_chunks", name);
-                let reader = chunks.merge(merge_fn, &indexer);
-                let _ = lmdb_writer_sx.send(reader.map(serialize_fn));
-            }
-            Err(e) => {
-                let _ = lmdb_writer_sx.send(Err(e));
-            }
-        })
-    });
+    puffin::profile_scope!("extract_chunk", name);
+    match extract_fn(chunk, indexer) {
+        Ok(chunk) => {
+            let _ = lmdb_writer_sx.send(Ok(serialize_fn(chunk)));
+        }
+        Err(e) => {
+            let _ = lmdb_writer_sx.send(Err(e));
+        }
+    }
 }

 /// Extract chunked data and send it into lmdb_writer_sx sender:
@@ -350,22 +298,13 @@ fn send_and_extract_flattened_documents_data(
    faceted_fields: &HashSet<FieldId>,
    primary_key_id: FieldId,
    geo_fields_ids: Option<(FieldId, FieldId)>,
-    stop_words: &Option<fst::Set<&[u8]>>,
+    stop_words: &Option<fst::Set<Vec<u8>>>,
    allowed_separators: &Option<&[&str]>,
    dictionary: &Option<&[&str]>,
    max_positions_per_attributes: Option<u32>,
 ) -> Result<(
    grenad::Reader<CursorClonableMmap>,
-    (
-        grenad::Reader<CursorClonableMmap>,
-        (
-            grenad::Reader<CursorClonableMmap>,
-            (
-                grenad::Reader<BufReader<File>>,
-                (grenad::Reader<BufReader<File>>, grenad::Reader<BufReader<File>>),
-            ),
-        ),
-    ),
+    (grenad::Reader<CursorClonableMmap>, grenad::Reader<CursorClonableMmap>),
 )> {
    let flattened_documents_chunk =
        flattened_documents_chunk.and_then(|c| unsafe { as_cloneable_grenad(&c) })?;
@@ -436,16 +375,17 @@ fn send_and_extract_flattened_documents_data(
                    fid_docid_facet_strings_chunk.clone(),
                )));

-                Ok((
-                    fid_docid_facet_numbers_chunk,
-                    (
-                        fid_docid_facet_strings_chunk,
-                        (
-                            fid_facet_is_null_docids_chunk,
-                            (fid_facet_is_empty_docids_chunk, fid_facet_exists_docids_chunk),
-                        ),
-                    ),
-                ))
+                let _ = lmdb_writer_sx
+                    .send(Ok(TypedChunk::FieldIdFacetIsNullDocids(fid_facet_is_null_docids_chunk)));
+
+                let _ = lmdb_writer_sx.send(Ok(TypedChunk::FieldIdFacetIsEmptyDocids(
+                    fid_facet_is_empty_docids_chunk,
+                )));
+
+                let _ = lmdb_writer_sx
+                    .send(Ok(TypedChunk::FieldIdFacetExistsDocids(fid_facet_exists_docids_chunk)));
+
+                Ok((fid_docid_facet_numbers_chunk, fid_docid_facet_strings_chunk))
            },
        );

--- a/milli/src/update/index_documents/helpers/grenad_helpers.rs
+++ b/milli/src/update/index_documents/helpers/grenad_helpers.rs
@@ -82,90 +82,6 @@ pub unsafe fn as_cloneable_grenad(
    Ok(reader)
 }

-pub trait MergeableReader
-where
-    Self: Sized,
-{
-    type Output;
-
-    fn merge(self, merge_fn: MergeFn, indexer: &GrenadParameters) -> Result<Self::Output>;
-}
-
-impl MergeableReader for Vec<grenad::Reader<BufReader<File>>> {
-    type Output = grenad::Reader<BufReader<File>>;
-
-    fn merge(self, merge_fn: MergeFn, params: &GrenadParameters) -> Result<Self::Output> {
-        let mut merger = MergerBuilder::new(merge_fn);
-        self.into_iter().try_for_each(|r| merger.push(r))?;
-        merger.finish(params)
-    }
-}
-
-impl MergeableReader for Vec<(grenad::Reader<BufReader<File>>, grenad::Reader<BufReader<File>>)> {
-    type Output = (grenad::Reader<BufReader<File>>, grenad::Reader<BufReader<File>>);
-
-    fn merge(self, merge_fn: MergeFn, params: &GrenadParameters) -> Result<Self::Output> {
-        let mut m1 = MergerBuilder::new(merge_fn);
-        let mut m2 = MergerBuilder::new(merge_fn);
-        for (r1, r2) in self.into_iter() {
-            m1.push(r1)?;
-            m2.push(r2)?;
-        }
-        Ok((m1.finish(params)?, m2.finish(params)?))
-    }
-}
-
-impl MergeableReader
-    for Vec<(
-        grenad::Reader<BufReader<File>>,
-        grenad::Reader<BufReader<File>>,
-        grenad::Reader<BufReader<File>>,
-    )>
-{
-    type Output = (
-        grenad::Reader<BufReader<File>>,
-        grenad::Reader<BufReader<File>>,
-        grenad::Reader<BufReader<File>>,
-    );
-
-    fn merge(self, merge_fn: MergeFn, params: &GrenadParameters) -> Result<Self::Output> {
-        let mut m1 = MergerBuilder::new(merge_fn);
-        let mut m2 = MergerBuilder::new(merge_fn);
-        let mut m3 = MergerBuilder::new(merge_fn);
-        for (r1, r2, r3) in self.into_iter() {
-            m1.push(r1)?;
-            m2.push(r2)?;
-            m3.push(r3)?;
-        }
-        Ok((m1.finish(params)?, m2.finish(params)?, m3.finish(params)?))
-    }
-}
-
-struct MergerBuilder<R>(grenad::MergerBuilder<R, MergeFn>);
-
-impl<R: io::Read + io::Seek> MergerBuilder<R> {
-    fn new(merge_fn: MergeFn) -> Self {
-        Self(grenad::MergerBuilder::new(merge_fn))
-    }
-
-    fn push(&mut self, reader: grenad::Reader<R>) -> Result<()> {
-        self.0.push(reader.into_cursor()?);
-        Ok(())
-    }
-
-    fn finish(self, params: &GrenadParameters) -> Result<grenad::Reader<BufReader<File>>> {
-        let merger = self.0.build();
-        let mut writer = create_writer(
-            params.chunk_compression_type,
-            params.chunk_compression_level,
-            tempfile::tempfile()?,
-        );
-        merger.write_into_stream_writer(&mut writer)?;
-
-        writer_into_reader(writer)
-    }
-}
-
 #[derive(Debug, Clone, Copy)]
 pub struct GrenadParameters {
    pub chunk_compression_type: CompressionType,
--- a/milli/src/update/index_documents/helpers/mod.rs
+++ b/milli/src/update/index_documents/helpers/mod.rs
@@ -10,7 +10,7 @@ use fst::{IntoStreamer, Streamer};
 pub use grenad_helpers::{
    as_cloneable_grenad, create_sorter, create_writer, grenad_obkv_into_chunks,
    merge_ignore_values, sorter_into_reader, write_sorter_into_database, writer_into_reader,
-    GrenadParameters, MergeableReader,
+    GrenadParameters,
 };
 pub use merge_functions::{
    keep_first, keep_latest_obkv, merge_btreeset_string, merge_cbo_roaring_bitmaps,
--- a/milli/src/update/index_documents/mod.rs
+++ b/milli/src/update/index_documents/mod.rs
@@ -5,12 +5,13 @@ mod transform;
 mod typed_chunk;

 use std::collections::{HashMap, HashSet};
-use std::io::{Cursor, Read, Seek};
+use std::io::{Read, Seek};
 use std::iter::FromIterator;
 use std::num::NonZeroU32;
 use std::result::Result as StdResult;

 use crossbeam_channel::{Receiver, Sender};
+use grenad::{Merger, MergerBuilder};
 use heed::types::Str;
 use heed::Database;
 use log::debug;
@@ -313,9 +314,6 @@ where
            }
        };

-        let original_documents = grenad::Reader::new(original_documents)?;
-        let flattened_documents = grenad::Reader::new(flattened_documents)?;
-
        // create LMDB writer channel
        let (lmdb_writer_sx, lmdb_writer_rx): (
            Sender<Result<TypedChunk>>,
@@ -354,11 +352,7 @@ where

        let stop_words = self.index.stop_words(self.wtxn)?;
        let separators = self.index.allowed_separators(self.wtxn)?;
-        let separators: Option<Vec<_>> =
-            separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
        let dictionary = self.index.dictionary(self.wtxn)?;
-        let dictionary: Option<Vec<_>> =
-            dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
        let exact_attributes = self.index.exact_attributes_ids(self.wtxn)?;
        let proximity_precision = self.index.proximity_precision(self.wtxn)?.unwrap_or_default();

@@ -368,55 +362,77 @@ where
            max_memory: self.indexer_config.max_memory,
            max_nb_chunks: self.indexer_config.max_nb_chunks, // default value, may be chosen.
        };
-        let documents_chunk_size =
-            self.indexer_config.documents_chunk_size.unwrap_or(1024 * 1024 * 4); // 4MiB
+        let documents_chunk_size = match self.indexer_config.documents_chunk_size {
+            Some(chunk_size) => chunk_size,
+            None => {
+                let default_chunk_size = 1024 * 1024 * 4; // 4MiB
+                let min_chunk_size = 1024 * 512; // 512KiB
+
+                // compute the chunk size from the number of available threads and the inputed data size.
+                let total_size = flattened_documents.metadata().map(|m| m.len());
+                let current_num_threads = pool.current_num_threads();
+                total_size
+                    .map_or(default_chunk_size, |size| (size as usize) / current_num_threads)
+                    .max(min_chunk_size)
+            }
+        };
+
+        let original_documents = grenad::Reader::new(original_documents)?;
+        let flattened_documents = grenad::Reader::new(flattened_documents)?;
+
        let max_positions_per_attributes = self.indexer_config.max_positions_per_attributes;

        let cloned_embedder = self.embedders.clone();

        // Run extraction pipeline in parallel.
        pool.install(|| {
-            puffin::profile_scope!("extract_and_send_grenad_chunks");
-            // split obkv file into several chunks
-            let original_chunk_iter =
-                grenad_obkv_into_chunks(original_documents, pool_params, documents_chunk_size);
+            let stop_words = stop_words.map(|sw| sw.map_data(Vec::from).unwrap());
+            rayon::spawn(move || {
+                puffin::profile_scope!("extract_and_send_grenad_chunks");
+                // split obkv file into several chunks
+                let original_chunk_iter =
+                    grenad_obkv_into_chunks(original_documents, pool_params, documents_chunk_size);

-            // split obkv file into several chunks
-            let flattened_chunk_iter =
-                grenad_obkv_into_chunks(flattened_documents, pool_params, documents_chunk_size);
+                // split obkv file into several chunks
+                let flattened_chunk_iter =
+                    grenad_obkv_into_chunks(flattened_documents, pool_params, documents_chunk_size);

-            let result = original_chunk_iter.and_then(|original_chunk| {
-                let flattened_chunk = flattened_chunk_iter?;
-                // extract all databases from the chunked obkv douments
-                extract::data_from_obkv_documents(
-                    original_chunk,
-                    flattened_chunk,
-                    pool_params,
-                    lmdb_writer_sx.clone(),
-                    searchable_fields,
-                    faceted_fields,
-                    primary_key_id,
-                    geo_fields_ids,
-                    field_id_map,
-                    stop_words,
-                    separators.as_deref(),
-                    dictionary.as_deref(),
-                    max_positions_per_attributes,
-                    exact_attributes,
-                    proximity_precision,
-                    cloned_embedder,
-                )
+                let separators: Option<Vec<_>> =
+                    separators.as_ref().map(|x| x.iter().map(String::as_str).collect());
+                let dictionary: Option<Vec<_>> =
+                    dictionary.as_ref().map(|x| x.iter().map(String::as_str).collect());
+                let result = original_chunk_iter.and_then(|original_chunk| {
+                    let flattened_chunk = flattened_chunk_iter?;
+                    // extract all databases from the chunked obkv douments
+                    extract::data_from_obkv_documents(
+                        original_chunk,
+                        flattened_chunk,
+                        pool_params,
+                        lmdb_writer_sx.clone(),
+                        searchable_fields,
+                        faceted_fields,
+                        primary_key_id,
+                        geo_fields_ids,
+                        field_id_map,
+                        stop_words,
+                        separators.as_deref(),
+                        dictionary.as_deref(),
+                        max_positions_per_attributes,
+                        exact_attributes,
+                        proximity_precision,
+                        cloned_embedder,
+                    )
+                });
+
+                if let Err(e) = result {
+                    let _ = lmdb_writer_sx.send(Err(e));
+                }
+
+                // needs to be dropped to avoid channel waiting lock.
+                drop(lmdb_writer_sx);
            });
-
-            if let Err(e) = result {
-                let _ = lmdb_writer_sx.send(Err(e));
-            }
-
-            // needs to be dropped to avoid channel waiting lock.
-            drop(lmdb_writer_sx);
        });

-        let index_is_empty = self.index.number_of_documents(self.wtxn)? == 0;
        let mut final_documents_ids = RoaringBitmap::new();

        let mut databases_seen = 0;
@@ -444,12 +460,21 @@ where
                    word_fid_docids_reader,
                } => {
                    let cloneable_chunk = unsafe { as_cloneable_grenad(&word_docids_reader)? };
-                    word_docids = Some(cloneable_chunk);
+                    let word_docids = word_docids.get_or_insert_with(|| {
+                        MergerBuilder::new(merge_deladd_cbo_roaring_bitmaps as MergeFn)
+                    });
+                    word_docids.push(cloneable_chunk.into_cursor()?);
                    let cloneable_chunk =
                        unsafe { as_cloneable_grenad(&exact_word_docids_reader)? };
-                    exact_word_docids = Some(cloneable_chunk);
+                    let exact_word_docids = exact_word_docids.get_or_insert_with(|| {
+                        MergerBuilder::new(merge_deladd_cbo_roaring_bitmaps as MergeFn)
+                    });
+                    exact_word_docids.push(cloneable_chunk.into_cursor()?);
                    let cloneable_chunk = unsafe { as_cloneable_grenad(&word_fid_docids_reader)? };
-                    word_fid_docids = Some(cloneable_chunk);
+                    let word_fid_docids = word_fid_docids.get_or_insert_with(|| {
+                        MergerBuilder::new(merge_deladd_cbo_roaring_bitmaps as MergeFn)
+                    });
+                    word_fid_docids.push(cloneable_chunk.into_cursor()?);
                    TypedChunk::WordDocids {
                        word_docids_reader,
                        exact_word_docids_reader,
@@ -458,7 +483,10 @@ where
                }
                TypedChunk::WordPositionDocids(chunk) => {
                    let cloneable_chunk = unsafe { as_cloneable_grenad(&chunk)? };
-                    word_position_docids = Some(cloneable_chunk);
+                    let word_position_docids = word_position_docids.get_or_insert_with(|| {
+                        MergerBuilder::new(merge_deladd_cbo_roaring_bitmaps as MergeFn)
+                    });
+                    word_position_docids.push(cloneable_chunk.into_cursor()?);
                    TypedChunk::WordPositionDocids(chunk)
                }
                TypedChunk::VectorPoints {
@@ -481,7 +509,7 @@ where
            };

            let (docids, is_merged_database) =
-                write_typed_chunk_into_index(typed_chunk, self.index, self.wtxn, index_is_empty)?;
+                write_typed_chunk_into_index(typed_chunk, self.index, self.wtxn)?;
            if !docids.is_empty() {
                final_documents_ids |= docids;
                let documents_seen_count = final_documents_ids.len();
@@ -538,10 +566,10 @@ where
        }

        self.execute_prefix_databases(
-            word_docids,
-            exact_word_docids,
-            word_position_docids,
-            word_fid_docids,
+            word_docids.map(MergerBuilder::build),
+            exact_word_docids.map(MergerBuilder::build),
+            word_position_docids.map(MergerBuilder::build),
+            word_fid_docids.map(MergerBuilder::build),
        )?;

        Ok(number_of_documents)
@@ -550,10 +578,10 @@ where
    #[logging_timer::time("IndexDocuments::{}")]
    pub fn execute_prefix_databases(
        self,
-        word_docids: Option<grenad::Reader<CursorClonableMmap>>,
-        exact_word_docids: Option<grenad::Reader<CursorClonableMmap>>,
-        word_position_docids: Option<grenad::Reader<CursorClonableMmap>>,
-        word_fid_docids: Option<grenad::Reader<CursorClonableMmap>>,
+        word_docids: Option<Merger<CursorClonableMmap, MergeFn>>,
+        exact_word_docids: Option<Merger<CursorClonableMmap, MergeFn>>,
+        word_position_docids: Option<Merger<CursorClonableMmap, MergeFn>>,
+        word_fid_docids: Option<Merger<CursorClonableMmap, MergeFn>>,
    ) -> Result<()>
    where
        FP: Fn(UpdateIndexingStep) + Sync,
@@ -728,7 +756,7 @@ where
 #[allow(clippy::too_many_arguments)]
 fn execute_word_prefix_docids(
    txn: &mut heed::RwTxn,
-    reader: grenad::Reader<Cursor<ClonableMmap>>,
+    merger: Merger<CursorClonableMmap, MergeFn>,
    word_docids_db: Database<Str, CboRoaringBitmapCodec>,
    word_prefix_docids_db: Database<Str, CboRoaringBitmapCodec>,
    indexer_config: &IndexerConfig,
@@ -738,13 +766,12 @@ fn execute_word_prefix_docids(
 ) -> Result<()> {
    puffin::profile_function!();

-    let cursor = reader.into_cursor()?;
    let mut builder = WordPrefixDocids::new(txn, word_docids_db, word_prefix_docids_db);
    builder.chunk_compression_type = indexer_config.chunk_compression_type;
    builder.chunk_compression_level = indexer_config.chunk_compression_level;
    builder.max_nb_chunks = indexer_config.max_nb_chunks;
    builder.max_memory = indexer_config.max_memory;
-    builder.execute(cursor, new_prefix_fst_words, common_prefix_fst_words, del_prefix_fst_words)?;
+    builder.execute(merger, new_prefix_fst_words, common_prefix_fst_words, del_prefix_fst_words)?;
    Ok(())
 }

--- a/milli/src/update/index_documents/typed_chunk.rs
+++ b/milli/src/update/index_documents/typed_chunk.rs
@@ -7,7 +7,7 @@ use bytemuck::allocation::pod_collect_to_vec;
 use charabia::{Language, Script};
 use grenad::MergerBuilder;
 use heed::types::Bytes;
-use heed::{PutFlags, RwTxn};
+use heed::RwTxn;
 use obkv::{KvReader, KvWriter};
 use roaring::RoaringBitmap;

@@ -119,7 +119,6 @@ pub(crate) fn write_typed_chunk_into_index(
    typed_chunk: TypedChunk,
    index: &Index,
    wtxn: &mut RwTxn,
-    index_is_empty: bool,
 ) -> Result<(RoaringBitmap, bool)> {
    puffin::profile_function!(typed_chunk.to_debug_string());

@@ -172,11 +171,10 @@ pub(crate) fn write_typed_chunk_into_index(
            index.put_documents_ids(wtxn, &docids)?;
        }
        TypedChunk::FieldIdWordCountDocids(fid_word_count_docids_iter) => {
-            append_entries_into_database(
+            write_entries_into_database(
                fid_word_count_docids_iter,
                &index.field_id_word_count_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
@@ -188,31 +186,28 @@ pub(crate) fn write_typed_chunk_into_index(
            word_fid_docids_reader,
        } => {
            let word_docids_iter = unsafe { as_cloneable_grenad(&word_docids_reader) }?;
-            append_entries_into_database(
+            write_entries_into_database(
                word_docids_iter.clone(),
                &index.word_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;

            let exact_word_docids_iter = unsafe { as_cloneable_grenad(&exact_word_docids_reader) }?;
-            append_entries_into_database(
+            write_entries_into_database(
                exact_word_docids_iter.clone(),
                &index.exact_word_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;

            let word_fid_docids_iter = unsafe { as_cloneable_grenad(&word_fid_docids_reader) }?;
-            append_entries_into_database(
+            write_entries_into_database(
                word_fid_docids_iter,
                &index.word_fid_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
@@ -230,11 +225,10 @@ pub(crate) fn write_typed_chunk_into_index(
            is_merged_database = true;
        }
        TypedChunk::WordPositionDocids(word_position_docids_iter) => {
-            append_entries_into_database(
+            write_entries_into_database(
                word_position_docids_iter,
                &index.word_position_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
@@ -251,44 +245,40 @@ pub(crate) fn write_typed_chunk_into_index(
            is_merged_database = true;
        }
        TypedChunk::FieldIdFacetExistsDocids(facet_id_exists_docids) => {
-            append_entries_into_database(
+            write_entries_into_database(
                facet_id_exists_docids,
                &index.facet_id_exists_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
            is_merged_database = true;
        }
        TypedChunk::FieldIdFacetIsNullDocids(facet_id_is_null_docids) => {
-            append_entries_into_database(
+            write_entries_into_database(
                facet_id_is_null_docids,
                &index.facet_id_is_null_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
            is_merged_database = true;
        }
        TypedChunk::FieldIdFacetIsEmptyDocids(facet_id_is_empty_docids) => {
-            append_entries_into_database(
+            write_entries_into_database(
                facet_id_is_empty_docids,
                &index.facet_id_is_empty_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
            is_merged_database = true;
        }
        TypedChunk::WordPairProximityDocids(word_pair_proximity_docids_iter) => {
-            append_entries_into_database(
+            write_entries_into_database(
                word_pair_proximity_docids_iter,
                &index.word_pair_proximity_docids,
                wtxn,
-                index_is_empty,
                deladd_serialize_add_side,
                merge_deladd_cbo_roaring_bitmaps_into_cbo_roaring_bitmap,
            )?;
@@ -541,7 +531,6 @@ fn write_entries_into_database<R, K, V, FS, FM>(
    data: grenad::Reader<R>,
    database: &heed::Database<K, V>,
    wtxn: &mut RwTxn,
-    index_is_empty: bool,
    serialize_value: FS,
    merge_values: FM,
 ) -> Result<()>
@@ -559,13 +548,9 @@ where
    while let Some((key, value)) = cursor.move_on_next()? {
        if valid_lmdb_key(key) {
            buffer.clear();
-            let value = if index_is_empty {
-                Some(serialize_value(value, &mut buffer)?)
-            } else {
-                match database.get(wtxn, key)? {
-                    Some(prev_value) => merge_values(value, prev_value, &mut buffer)?,
-                    None => Some(serialize_value(value, &mut buffer)?),
-                }
+            let value = match database.get(wtxn, key)? {
+                Some(prev_value) => merge_values(value, prev_value, &mut buffer)?,
+                None => Some(serialize_value(value, &mut buffer)?),
            };
            match value {
                Some(value) => database.put(wtxn, key, value)?,
@@ -578,58 +563,3 @@ where

    Ok(())
 }
-
-/// Write provided entries in database using serialize_value function.
-/// merge_values function is used if an entry already exist in the database.
-/// All provided entries must be ordered.
-/// If the index is not empty, write_entries_into_database is called instead.
-fn append_entries_into_database<R, K, V, FS, FM>(
-    data: grenad::Reader<R>,
-    database: &heed::Database<K, V>,
-    wtxn: &mut RwTxn,
-    index_is_empty: bool,
-    serialize_value: FS,
-    merge_values: FM,
-) -> Result<()>
-where
-    R: io::Read + io::Seek,
-    FS: for<'a> Fn(&'a [u8], &'a mut Vec<u8>) -> Result<&'a [u8]>,
-    FM: for<'a> Fn(&[u8], &[u8], &'a mut Vec<u8>) -> Result<Option<&'a [u8]>>,
-    K: for<'a> heed::BytesDecode<'a>,
-{
-    puffin::profile_function!(format!("number of entries: {}", data.len()));
-
-    if !index_is_empty {
-        return write_entries_into_database(
-            data,
-            database,
-            wtxn,
-            false,
-            serialize_value,
-            merge_values,
-        );
-    }
-
-    let mut buffer = Vec::new();
-    let mut database = database.iter_mut(wtxn)?.remap_types::<Bytes, Bytes>();
-
-    let mut cursor = data.into_cursor()?;
-    while let Some((key, value)) = cursor.move_on_next()? {
-        if valid_lmdb_key(key) {
-            debug_assert!(
-                K::bytes_decode(key).is_ok(),
-                "Couldn't decode key with the database decoder, key length: {} - key bytes: {:x?}",
-                key.len(),
-                &key
-            );
-            buffer.clear();
-            let value = serialize_value(value, &mut buffer)?;
-            unsafe {
-                // safety: We do not keep a reference to anything that lives inside the database
-                database.put_current_with_options::<Bytes>(PutFlags::APPEND, key, value)?
-            };
-        }
-    }
-
-    Ok(())
-}
--- a/milli/src/update/word_prefix_docids.rs
+++ b/milli/src/update/word_prefix_docids.rs
@@ -42,7 +42,7 @@ impl<'t, 'i> WordPrefixDocids<'t, 'i> {
    #[logging_timer::time("WordPrefixDocids::{}")]
    pub fn execute(
        self,
-        mut new_word_docids_iter: grenad::ReaderCursor<CursorClonableMmap>,
+        new_word_docids: grenad::Merger<CursorClonableMmap, MergeFn>,
        new_prefix_fst_words: &[String],
        common_prefix_fst_words: &[&[String]],
        del_prefix_fst_words: &HashSet<Vec<u8>>,
@@ -63,7 +63,8 @@ impl<'t, 'i> WordPrefixDocids<'t, 'i> {
        if !common_prefix_fst_words.is_empty() {
            let mut current_prefixes: Option<&&[String]> = None;
            let mut prefixes_cache = HashMap::new();
-            while let Some((word, data)) = new_word_docids_iter.move_on_next()? {
+            let mut new_word_docids_iter = new_word_docids.into_stream_merger_iter()?;
+            while let Some((word, data)) = new_word_docids_iter.next()? {
                current_prefixes = match current_prefixes.take() {
                    Some(prefixes) if word.starts_with(prefixes[0].as_bytes()) => Some(prefixes),
                    _otherwise => {
--- a/milli/src/update/words_prefix_integer_docids.rs
+++ b/milli/src/update/words_prefix_integer_docids.rs
@@ -47,7 +47,7 @@ impl<'t, 'i> WordPrefixIntegerDocids<'t, 'i> {
    #[logging_timer::time("WordPrefixIntegerDocids::{}")]
    pub fn execute(
        self,
-        new_word_integer_docids: grenad::Reader<CursorClonableMmap>,
+        new_word_integer_docids: grenad::Merger<CursorClonableMmap, MergeFn>,
        new_prefix_fst_words: &[String],
        common_prefix_fst_words: &[&[String]],
        del_prefix_fst_words: &HashSet<Vec<u8>>,
@@ -64,14 +64,14 @@ impl<'t, 'i> WordPrefixIntegerDocids<'t, 'i> {
            self.max_memory,
        );

-        let mut new_word_integer_docids_iter = new_word_integer_docids.into_cursor()?;
-
        if !common_prefix_fst_words.is_empty() {
            // We fetch all the new common prefixes between the previous and new prefix fst.
            let mut buffer = Vec::new();
            let mut current_prefixes: Option<&&[String]> = None;
            let mut prefixes_cache = HashMap::new();
-            while let Some((key, data)) = new_word_integer_docids_iter.move_on_next()? {
+            let mut new_word_integer_docids_iter =
+                new_word_integer_docids.into_stream_merger_iter()?;
+            while let Some((key, data)) = new_word_integer_docids_iter.next()? {
                let (word, pos) =
                    StrBEU16Codec::bytes_decode(key).map_err(heed::Error::Decoding)?;
Author	SHA1	Message	Date
ManyTheFish	60bfd3aef1	Send directly each chunk to the main thread instead of merging them at the end of the extracting	2024-01-22 16:30:27 +01:00
ManyTheFish	5027eea1a8	Remove append function	2024-01-22 16:30:09 +01:00
ManyTheFish	5079fb4b14	Compute chunk size based on the input data size ant the number of indexing threads	2024-01-22 16:29:44 +01:00
meili-bors[bot]	8e016fbfeb	Merge #4319 4319: Update README r=curquiza a=codesmith-emmy # Pull Request ## Related issue Fixes #<issue_number> ## What does this PR do? - ... ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: emmanuel <154705254+codesmith-emmy@users.noreply.github.com>	2024-01-15 18:41:14 +00:00
meili-bors[bot]	1ccde9bf0b	Merge #4316 4316: Autobatch the task deletions r=curquiza a=irevoire # Pull Request ## Related issue Fix part of https://github.com/meilisearch/meilisearch-support/issues/69 Fix #4315 ## What does this PR do? - Autobatch the task deletions Co-authored-by: Tamo <tamo@meilisearch.com>	2024-01-15 17:54:50 +00:00
meili-bors[bot]	34e814f400	Merge #4327 4327: Bring back changes from `release-v1.6.0` to `main` r=dureuill a=curquiza Co-authored-by: Paul Sanders <psanders1@gmail.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>	2024-01-15 16:52:05 +00:00
meili-bors[bot]	a6fa0b97ec	Merge #4318 4318: Hide embedders r=ManyTheFish a=dureuill Hides `embedders` when it is an empty dictionary. Manual tests: - getting settings with empty embedders: not displayed - getting settings with non-empty embedders: displayed like before - dump with empty embedders: can be imported - dump with non-empty embedders: can be imported Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-01-15 09:37:31 +00:00
emmanuel	552127021f	Update	2024-01-12 16:03:23 +01:00
Louis Dureuil	38abfec611	Fix tests	2024-01-11 21:35:30 +01:00
Louis Dureuil	84a5c304fc	Don't display the embedders setting when it is an empty dict	2024-01-11 21:35:06 +01:00
meili-bors[bot]	e93d36d5b9	Merge #4313 4313: Fix document formatting performances r=Kerollmops a=ManyTheFish reduce the formatted option list to the attributes that should be formatted, instead of all the attributes to display. The time to compute the `format` list scales with the number of fields to format; cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity: `d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format. Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-01-11 14:19:44 +00:00
ManyTheFish	95f8e21533	fix typos	2024-01-11 15:07:08 +01:00
Tamo	b4d7d80ad9	autobatch the task deletions	2024-01-11 14:58:07 +01:00
meili-bors[bot]	68f197624e	Merge #4314 4314: Fix proximity precision telemetry r=Kerollmops a=ManyTheFish The proximity precision telemetry was partially missing in the global setting route. This PR adds the missing field and return the default value when the value is not set. Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-01-11 13:50:03 +00:00
ManyTheFish	b79b03d4e2	Fix proximity precision telemetry	2024-01-11 13:24:26 +01:00
ManyTheFish	86270e6878	Transform fields contained into _format into strings	2024-01-11 12:44:56 +01:00
ManyTheFish	81b6128b29	Update tests	2024-01-11 12:28:32 +01:00
ManyTheFish	5f5a486895	Reduce formatting time	2024-01-11 11:36:41 +01:00
ManyTheFish	5f4fc6c955	Add timer logs	2024-01-11 09:44:16 +01:00
meili-bors[bot]	1f5e8fc072	Merge #4311 4311: Limit the number of values returned by the facet search r=dureuill a=Kerollmops This PR fixes a bug where the number of values per facet returned by the `indexes/{index}/facet-search` route was not tacking the `faceting.maxValuePerFacet` setting. It also adds a test. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-01-10 16:04:06 +00:00
Clément Renault	3f3462ab62	Limit the number of values returned by the facet search	2024-01-10 16:54:08 +01:00
meili-bors[bot]	93363b0201	Merge #4308 4308: Fix hang on `/indexes` and `/stats` routes r=Kerollmops a=dureuill # Pull Request ## Related issue Fixes #4218 ## Context - A previous fix added a field to the `IndexScheduler` to memorize the `currently_updating_index`, so that accessing it through the search would return the handle without trying to open it. This resolved a hang on the search, but #4218 reported further hangs on the `/indexes` and `/stats` routes - These routes were shunting the `IndexScheduler` and using internal `IndexMapper` logic to access the indexes, again trying to reopen the updating index. ## What does this PR do? - Moves the logic relative to the `currently_updating_index` from the `IndexScheduler` to the `IndexMapper`, so that any index request to the `IndexMapper` can benefit from it. ## Test 1. Follow reproducer from #4218 2. Before this PR, notice a hang on `/stats` and `/indexes`, but not on `/indexes/<updating_index>/search` 3. After this PR, notice no hang on either of `/stats`, `/indexes` or `/indexes/<updating_index>/search` Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-01-10 10:46:20 +00:00
Louis Dureuil	97bb1ff9e2	Move `currently_updating_index` to IndexMapper	2024-01-09 15:37:27 +01:00
meili-bors[bot]	5204c0b60b	Merge #4297 4297: Update license for 2024 r=curquiza a=meili-bot _This PR is auto-generated._ Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>	2024-01-03 13:54:19 +00:00
meili-bot	e73cd692db	Update LICENSE	2024-01-03 14:32:41 +01:00
meili-bors[bot]	29b453346b	Merge #4293 4293: Update SDK test dependencies r=curquiza a=curquiza Replace dependabot updates The changes are really un-impactful for the engine team velocity because is about a CI - that does not run during release deployment - that does not run to merge a PR It's only a weekly scheduled CI to check the breaking we introduced in the integrations. I updated the dependencies based on what we do on the integration CIs For example for dart, I looked at what we have in the [Dart CI](`63fd758882/.github/workflows/tests.yml (L16-L54)`) and I updated our CI in this repo accordingly. I did the same for each repository. This ensures we test the same things. Co-authored-by: curquiza <clementine@meilisearch.com>	2024-01-03 13:26:50 +00:00
meili-bors[bot]	c4bb435374	Merge #4295 4295: fix compilation warnings on main r=curquiza a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4292 ## What does this PR do? - Removed unused imports #4294 fixes the issue for the release v1.6 Co-authored-by: Tamo <tamo@meilisearch.com>	2024-01-02 15:33:06 +00:00
Tamo	2bcff2ea46	fix warning	2024-01-02 15:19:00 +01:00
curquiza	1275e72e0b	Update SDK test dependencies	2024-01-02 09:59:46 +01:00