Log when an entry is too large to fit in the BBQueue

Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache: - Optimize the prefix generation for word position docids (`@manythefish)` - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)` ## Benchmarks on 1cpu 2gb gpo3 (5k IOps) Before on the tag meilisearch-v1.12.0-rc.3. ``` word_position_docids:merge_and_send_docids: 988s compute_word_fst: 23.3s word_pair_proximity_docids:merge_and_send_docids: 428s compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s ``` After sorting the whole `HashMap`s in a `Vec` on this branch. ``` word_position_docids:merge_and_send_docids: 202s compute_word_fst: 20.4s word_pair_proximity_docids:merge_and_send_docids: 427s compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s ``` Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-12-05 04:05:42 +00:00 · 2024-12-05 11:24:16 +01:00 · 2024-12-05 09:35:52 +00:00 · 2024-12-05 10:03:05 +01:00 · 2024-12-05 09:01:02 +00:00 · 2024-12-04 17:04:14 +00:00
17 changed files with 579 additions and 111 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -492,7 +492,7 @@ checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6"
 [[package]]
 name = "bbqueue"
 version = "0.5.1"
-source = "git+https://github.com/kerollmops/bbqueue#cbb87cc707b5af415ef203bdaf2443e06ba0d6d4"
+source = "git+https://github.com/meilisearch/bbqueue#cbb87cc707b5af415ef203bdaf2443e06ba0d6d4"

 [[package]]
 name = "benchmarks"
--- a/crates/index-scheduler/src/lib.rs
+++ b/crates/index-scheduler/src/lib.rs
@@ -1440,7 +1440,7 @@ impl IndexScheduler {

        // if the task doesn't delete anything and 50% of the task queue is full, we must refuse to enqueue the incomming task
        if !matches!(&kind, KindWithContent::TaskDeletion { tasks, .. } if !tasks.is_empty())
-            && (self.env.non_free_pages_size()? * 100) / self.env.info().map_size as u64 > 50
+            && (self.env.non_free_pages_size()? * 100) / self.env.info().map_size as u64 > 40
        {
            return Err(Error::NoSpaceLeftInTaskQueue);
        }
--- a/crates/milli/Cargo.toml
+++ b/crates/milli/Cargo.toml
@@ -98,7 +98,7 @@ allocator-api2 = "0.2.18"
 rustc-hash = "2.0.0"
 uell = "0.1.0"
 enum-iterator = "2.1.0"
-bbqueue = { git = "https://github.com/kerollmops/bbqueue" }
+bbqueue = { git = "https://github.com/meilisearch/bbqueue" }
 flume = { version = "0.11.1", default-features = false }

 [dev-dependencies]
--- a/crates/milli/src/update/new/channel.rs
+++ b/crates/milli/src/update/new/channel.rs
@@ -7,7 +7,7 @@ use std::num::NonZeroU16;
 use std::ops::Range;
 use std::time::Duration;

-use bbqueue::framed::{FrameGrantR, FrameGrantW, FrameProducer};
+use bbqueue::framed::{FrameGrantR, FrameProducer};
 use bbqueue::BBBuffer;
 use bytemuck::{checked, CheckedBitPattern, NoUninit};
 use flume::{RecvTimeoutError, SendError};
@@ -454,14 +454,10 @@ impl<'b> ExtractorBbqueueSender<'b> {
        }

        // Spin loop to have a frame the size we requested.
-        let mut grant = reserve_grant(&mut producer, total_length, &self.sender)?;
-        payload_header.serialize_into(&mut grant);
-
-        // We only send a wake up message when the channel is empty
-        // so that we don't fill the channel with too many WakeUps.
-        if self.sender.is_empty() {
-            self.sender.send(ReceiverAction::WakeUp).unwrap();
-        }
+        reserve_and_write_grant(&mut producer, total_length, &self.sender, |grant| {
+            payload_header.serialize_into(grant);
+            Ok(())
+        })?;

        Ok(())
    }
@@ -484,6 +480,8 @@ impl<'b> ExtractorBbqueueSender<'b> {
        let payload_header = EntryHeader::ArroySetVectors(arroy_set_vector);
        let total_length = EntryHeader::total_set_vectors_size(embeddings.len(), dimensions);
        if total_length > capacity {
+            tracing::trace!("We are spilling a large vector that is {total_length} bytes which is larger than the capacity of {capacity} bytes");
+
            let mut value_file = tempfile::tempfile().map(BufWriter::new)?;
            for embedding in embeddings {
                let mut embedding_bytes = bytemuck::cast_slice(embedding);
@@ -500,24 +498,20 @@ impl<'b> ExtractorBbqueueSender<'b> {
        }

        // Spin loop to have a frame the size we requested.
-        let mut grant = reserve_grant(&mut producer, total_length, &self.sender)?;
+        reserve_and_write_grant(&mut producer, total_length, &self.sender, |grant| {
+            let header_size = payload_header.header_size();
+            let (header_bytes, remaining) = grant.split_at_mut(header_size);
+            payload_header.serialize_into(header_bytes);

-        let header_size = payload_header.header_size();
-        let (header_bytes, remaining) = grant.split_at_mut(header_size);
-        payload_header.serialize_into(header_bytes);
-
-        if dimensions != 0 {
-            let output_iter = remaining.chunks_exact_mut(dimensions * mem::size_of::<f32>());
-            for (embedding, output) in embeddings.iter().zip(output_iter) {
-                output.copy_from_slice(bytemuck::cast_slice(embedding));
+            if dimensions != 0 {
+                let output_iter = remaining.chunks_exact_mut(dimensions * mem::size_of::<f32>());
+                for (embedding, output) in embeddings.iter().zip(output_iter) {
+                    output.copy_from_slice(bytemuck::cast_slice(embedding));
+                }
            }
-        }

-        // We only send a wake up message when the channel is empty
-        // so that we don't fill the channel with too many WakeUps.
-        if self.sender.is_empty() {
-            self.sender.send(ReceiverAction::WakeUp).unwrap();
-        }
+            Ok(())
+        })?;

        Ok(())
    }
@@ -556,6 +550,8 @@ impl<'b> ExtractorBbqueueSender<'b> {
        let payload_header = EntryHeader::DbOperation(operation);
        let total_length = EntryHeader::total_key_value_size(key_length, value_length);
        if total_length > capacity {
+            tracing::trace!("We are spilling a large entry that is {total_length} bytes which is larger than the capacity of {capacity} bytes");
+
            let mut key_buffer = vec![0; key_length.get() as usize].into_boxed_slice();
            let value_file = tempfile::tempfile()?;
            value_file.set_len(value_length.try_into().unwrap())?;
@@ -575,19 +571,13 @@ impl<'b> ExtractorBbqueueSender<'b> {
        }

        // Spin loop to have a frame the size we requested.
-        let mut grant = reserve_grant(&mut producer, total_length, &self.sender)?;
-
-        let header_size = payload_header.header_size();
-        let (header_bytes, remaining) = grant.split_at_mut(header_size);
-        payload_header.serialize_into(header_bytes);
-        let (key_buffer, value_buffer) = remaining.split_at_mut(key_length.get() as usize);
-        key_value_writer(key_buffer, value_buffer)?;
-
-        // We only send a wake up message when the channel is empty
-        // so that we don't fill the channel with too many WakeUps.
-        if self.sender.is_empty() {
-            self.sender.send(ReceiverAction::WakeUp).unwrap();
-        }
+        reserve_and_write_grant(&mut producer, total_length, &self.sender, |grant| {
+            let header_size = payload_header.header_size();
+            let (header_bytes, remaining) = grant.split_at_mut(header_size);
+            payload_header.serialize_into(header_bytes);
+            let (key_buffer, value_buffer) = remaining.split_at_mut(key_length.get() as usize);
+            key_value_writer(key_buffer, value_buffer)
+        })?;

        Ok(())
    }
@@ -629,37 +619,44 @@ impl<'b> ExtractorBbqueueSender<'b> {
        }

        // Spin loop to have a frame the size we requested.
-        let mut grant = reserve_grant(&mut producer, total_length, &self.sender)?;
-
-        let header_size = payload_header.header_size();
-        let (header_bytes, remaining) = grant.split_at_mut(header_size);
-        payload_header.serialize_into(header_bytes);
-        key_writer(remaining)?;
-
-        // We only send a wake up message when the channel is empty
-        // so that we don't fill the channel with too many WakeUps.
-        if self.sender.is_empty() {
-            self.sender.send(ReceiverAction::WakeUp).unwrap();
-        }
+        reserve_and_write_grant(&mut producer, total_length, &self.sender, |grant| {
+            let header_size = payload_header.header_size();
+            let (header_bytes, remaining) = grant.split_at_mut(header_size);
+            payload_header.serialize_into(header_bytes);
+            key_writer(remaining)
+        })?;

        Ok(())
    }
 }

-/// Try to reserve a frame grant of `total_length` by spin looping
-/// on the BBQueue buffer and panics if the receiver has been disconnected.
-fn reserve_grant<'b>(
-    producer: &mut FrameProducer<'b>,
+/// Try to reserve a frame grant of `total_length` by spin
+/// looping on the BBQueue buffer, panics if the receiver
+/// has been disconnected or send a WakeUp message if necessary.
+fn reserve_and_write_grant<F>(
+    producer: &mut FrameProducer,
    total_length: usize,
    sender: &flume::Sender<ReceiverAction>,
-) -> crate::Result<FrameGrantW<'b>> {
+    f: F,
+) -> crate::Result<()>
+where
+    F: FnOnce(&mut [u8]) -> crate::Result<()>,
+{
    loop {
        for _ in 0..10_000 {
            match producer.grant(total_length) {
                Ok(mut grant) => {
                    // We could commit only the used memory.
-                    grant.to_commit(total_length);
-                    return Ok(grant);
+                    f(&mut grant)?;
+                    grant.commit(total_length);
+
+                    // We only send a wake up message when the channel is empty
+                    // so that we don't fill the channel with too many WakeUps.
+                    if sender.is_empty() {
+                        sender.send(ReceiverAction::WakeUp).unwrap();
+                    }
+
+                    return Ok(());
                }
                Err(bbqueue::Error::InsufficientSize) => continue,
                Err(e) => unreachable!("{e:?}"),
@@ -668,6 +665,11 @@ fn reserve_grant<'b>(
        if sender.is_disconnected() {
            return Err(Error::InternalError(InternalError::AbortedIndexation));
        }
+
+        // We prefer to yield and allow the writing thread
+        // to do its job, especially beneficial when there
+        // is only one CPU core available.
+        std::thread::yield_now();
    }
 }

--- a/crates/milli/src/update/new/extract/cache.rs
+++ b/crates/milli/src/update/new/extract/cache.rs
@@ -466,12 +466,13 @@ pub fn transpose_and_freeze_caches<'a, 'extractor>(
    Ok(bucket_caches)
 }

-/// Merges the caches that must be all associated to the same bucket.
+/// Merges the caches that must be all associated to the same bucket
+/// but make sure to sort the different buckets before performing the merges.
 ///
 /// # Panics
 ///
 /// - If the bucket IDs in these frozen caches are not exactly the same.
-pub fn merge_caches<F>(frozen: Vec<FrozenCache>, mut f: F) -> Result<()>
+pub fn merge_caches_sorted<F>(frozen: Vec<FrozenCache>, mut f: F) -> Result<()>
 where
    F: for<'a> FnMut(&'a [u8], DelAddRoaringBitmap) -> Result<()>,
 {
@@ -543,12 +544,12 @@ where

    // Then manage the content on the HashMap entries that weren't taken (mem::take).
    while let Some(mut map) = maps.pop() {
-        for (key, bbbul) in map.iter_mut() {
-            // Make sure we don't try to work with entries already managed by the spilled
-            if bbbul.is_empty() {
-                continue;
-            }
+        // Make sure we don't try to work with entries already managed by the spilled
+        let mut ordered_entries: Vec<_> =
+            map.iter_mut().filter(|(_, bbbul)| !bbbul.is_empty()).collect();
+        ordered_entries.sort_unstable_by_key(|(key, _)| *key);

+        for (key, bbbul) in ordered_entries {
            let mut output = DelAddRoaringBitmap::empty();
            output.union_and_clear_bbbul(bbbul);

--- a/crates/milli/src/update/new/extract/geo/mod.rs
+++ b/crates/milli/src/update/new/extract/geo/mod.rs
@@ -1,6 +1,6 @@
 use std::cell::RefCell;
 use std::fs::File;
-use std::io::{self, BufReader, BufWriter, ErrorKind, Read, Write as _};
+use std::io::{self, BufReader, BufWriter, ErrorKind, Read, Seek as _, Write as _};
 use std::{iter, mem, result};

 use bumpalo::Bump;
@@ -97,30 +97,34 @@ pub struct FrozenGeoExtractorData<'extractor> {
 impl<'extractor> FrozenGeoExtractorData<'extractor> {
    pub fn iter_and_clear_removed(
        &mut self,
-    ) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
-        mem::take(&mut self.removed)
+    ) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
+        Ok(mem::take(&mut self.removed)
            .iter()
            .copied()
            .map(Ok)
-            .chain(iterator_over_spilled_geopoints(&mut self.spilled_removed))
+            .chain(iterator_over_spilled_geopoints(&mut self.spilled_removed)?))
    }

    pub fn iter_and_clear_inserted(
        &mut self,
-    ) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
-        mem::take(&mut self.inserted)
+    ) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
+        Ok(mem::take(&mut self.inserted)
            .iter()
            .copied()
            .map(Ok)
-            .chain(iterator_over_spilled_geopoints(&mut self.spilled_inserted))
+            .chain(iterator_over_spilled_geopoints(&mut self.spilled_inserted)?))
    }
 }

 fn iterator_over_spilled_geopoints(
    spilled: &mut Option<BufReader<File>>,
-) -> impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_ {
+) -> io::Result<impl IntoIterator<Item = io::Result<ExtractedGeoPoint>> + '_> {
    let mut spilled = spilled.take();
-    iter::from_fn(move || match &mut spilled {
+    if let Some(spilled) = &mut spilled {
+        spilled.rewind()?;
+    }
+
+    Ok(iter::from_fn(move || match &mut spilled {
        Some(file) => {
            let geopoint_bytes = &mut [0u8; mem::size_of::<ExtractedGeoPoint>()];
            match file.read_exact(geopoint_bytes) {
@@ -130,7 +134,7 @@ fn iterator_over_spilled_geopoints(
            }
        }
        None => None,
-    })
+    }))
 }

 impl<'extractor> Extractor<'extractor> for GeoExtractor {
@@ -157,7 +161,9 @@ impl<'extractor> Extractor<'extractor> for GeoExtractor {
        let mut data_ref = context.data.borrow_mut_or_yield();

        for change in changes {
-            if max_memory.map_or(false, |mm| context.extractor_alloc.allocated_bytes() >= mm) {
+            if data_ref.spilled_removed.is_none()
+                && max_memory.map_or(false, |mm| context.extractor_alloc.allocated_bytes() >= mm)
+            {
                // We must spill as we allocated too much memory
                data_ref.spilled_removed = tempfile::tempfile().map(BufWriter::new).map(Some)?;
                data_ref.spilled_inserted = tempfile::tempfile().map(BufWriter::new).map(Some)?;
--- a/crates/milli/src/update/new/extract/mod.rs
+++ b/crates/milli/src/update/new/extract/mod.rs
@@ -6,7 +6,9 @@ mod searchable;
 mod vectors;

 use bumpalo::Bump;
-pub use cache::{merge_caches, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap};
+pub use cache::{
+    merge_caches_sorted, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap,
+};
 pub use documents::*;
 pub use faceted::*;
 pub use geo::*;
--- a/crates/milli/src/update/new/indexer/mod.rs
+++ b/crates/milli/src/update/new/indexer/mod.rs
@@ -86,9 +86,11 @@ where
        (grenad_parameters, 2 * minimum_capacity), // 100 MiB by thread by default
        |max_memory| {
            // 2% of the indexing memory
-            let total_bbbuffer_capacity = (max_memory / 100 / 2).min(minimum_capacity);
+            let total_bbbuffer_capacity = (max_memory / 100 / 2).max(minimum_capacity);
            let new_grenad_parameters = GrenadParameters {
-                max_memory: Some(max_memory - total_bbbuffer_capacity),
+                max_memory: Some(
+                    max_memory.saturating_sub(total_bbbuffer_capacity).max(100 * 1024 * 1024),
+                ),
                ..grenad_parameters
            };
            (new_grenad_parameters, total_bbbuffer_capacity)
--- a/crates/milli/src/update/new/merger.rs
+++ b/crates/milli/src/update/new/merger.rs
@@ -9,8 +9,8 @@ use roaring::RoaringBitmap;

 use super::channel::*;
 use super::extract::{
-    merge_caches, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap, FacetKind,
-    GeoExtractorData,
+    merge_caches_sorted, transpose_and_freeze_caches, BalancedCaches, DelAddRoaringBitmap,
+    FacetKind, GeoExtractorData,
 };
 use crate::{CboRoaringBitmapCodec, FieldId, GeoPoint, Index, InternalError, Result};

@@ -34,7 +34,7 @@ where
        }

        let mut frozen = data.into_inner().freeze()?;
-        for result in frozen.iter_and_clear_removed() {
+        for result in frozen.iter_and_clear_removed()? {
            let extracted_geo_point = result?;
            let removed = rtree.remove(&GeoPoint::from(extracted_geo_point));
            debug_assert!(removed.is_some());
@@ -42,7 +42,7 @@ where
            debug_assert!(removed);
        }

-        for result in frozen.iter_and_clear_inserted() {
+        for result in frozen.iter_and_clear_inserted()? {
            let extracted_geo_point = result?;
            rtree.insert(GeoPoint::from(extracted_geo_point));
            let inserted = faceted.insert(extracted_geo_point.docid);
@@ -78,7 +78,7 @@ where
        if must_stop_processing() {
            return Err(InternalError::AbortedIndexation.into());
        }
-        merge_caches(frozen, |key, DelAddRoaringBitmap { del, add }| {
+        merge_caches_sorted(frozen, |key, DelAddRoaringBitmap { del, add }| {
            let current = database.get(&rtxn, key)?;
            match merge_cbo_bitmaps(current, del, add)? {
                Operation::Write(bitmap) => {
@@ -107,7 +107,7 @@ pub fn merge_and_send_facet_docids<'extractor>(
        .map(|frozen| {
            let mut facet_field_ids_delta = FacetFieldIdsDelta::default();
            let rtxn = index.read_txn()?;
-            merge_caches(frozen, |key, DelAddRoaringBitmap { del, add }| {
+            merge_caches_sorted(frozen, |key, DelAddRoaringBitmap { del, add }| {
                let current = database.get_cbo_roaring_bytes_value(&rtxn, key)?;
                match merge_cbo_bitmaps(current, del, add)? {
                    Operation::Write(bitmap) => {
--- a/crates/milli/src/update/new/word_fst_builder.rs
+++ b/crates/milli/src/update/new/word_fst_builder.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::BTreeSet;
 use std::io::BufWriter;

 use fst::{Set, SetBuilder, Streamer};
@@ -75,8 +75,8 @@ pub struct PrefixData {

 #[derive(Debug)]
 pub struct PrefixDelta {
-    pub modified: HashSet<Prefix>,
-    pub deleted: HashSet<Prefix>,
+    pub modified: BTreeSet<Prefix>,
+    pub deleted: BTreeSet<Prefix>,
 }

 struct PrefixFstBuilder {
@@ -86,7 +86,7 @@ struct PrefixFstBuilder {
    prefix_fst_builders: Vec<SetBuilder<Vec<u8>>>,
    current_prefix: Vec<Prefix>,
    current_prefix_count: Vec<usize>,
-    modified_prefixes: HashSet<Prefix>,
+    modified_prefixes: BTreeSet<Prefix>,
    current_prefix_is_modified: Vec<bool>,
 }

@@ -110,7 +110,7 @@ impl PrefixFstBuilder {
            prefix_fst_builders,
            current_prefix: vec![Prefix::new(); max_prefix_length],
            current_prefix_count: vec![0; max_prefix_length],
-            modified_prefixes: HashSet::new(),
+            modified_prefixes: BTreeSet::new(),
            current_prefix_is_modified: vec![false; max_prefix_length],
        })
    }
@@ -180,7 +180,7 @@ impl PrefixFstBuilder {
        let prefix_fst_mmap = unsafe { Mmap::map(&prefix_fst_file)? };
        let new_prefix_fst = Set::new(&prefix_fst_mmap)?;
        let old_prefix_fst = index.words_prefixes_fst(rtxn)?;
-        let mut deleted_prefixes = HashSet::new();
+        let mut deleted_prefixes = BTreeSet::new();
        {
            let mut deleted_prefixes_stream = old_prefix_fst.op().add(&new_prefix_fst).difference();
            while let Some(prefix) = deleted_prefixes_stream.next() {
--- a/crates/milli/src/update/new/words_prefix_docids.rs
+++ b/crates/milli/src/update/new/words_prefix_docids.rs
@@ -1,5 +1,5 @@
 use std::cell::RefCell;
-use std::collections::HashSet;
+use std::collections::BTreeSet;
 use std::io::{BufReader, BufWriter, Read, Seek, Write};

 use hashbrown::HashMap;
@@ -37,8 +37,8 @@ impl WordPrefixDocids {
    fn execute(
        self,
        wtxn: &mut heed::RwTxn,
-        prefix_to_compute: &HashSet<Prefix>,
-        prefix_to_delete: &HashSet<Prefix>,
+        prefix_to_compute: &BTreeSet<Prefix>,
+        prefix_to_delete: &BTreeSet<Prefix>,
    ) -> Result<()> {
        delete_prefixes(wtxn, &self.prefix_database, prefix_to_delete)?;
        self.recompute_modified_prefixes(wtxn, prefix_to_compute)
@@ -48,7 +48,7 @@ impl WordPrefixDocids {
    fn recompute_modified_prefixes(
        &self,
        wtxn: &mut RwTxn,
-        prefixes: &HashSet<Prefix>,
+        prefixes: &BTreeSet<Prefix>,
    ) -> Result<()> {
        // We fetch the docids associated to the newly added word prefix fst only.
        // And collect the CboRoaringBitmaps pointers in an HashMap.
@@ -127,7 +127,7 @@ impl<'a, 'rtxn> FrozenPrefixBitmaps<'a, 'rtxn> {
    pub fn from_prefixes(
        database: Database<Bytes, CboRoaringBitmapCodec>,
        rtxn: &'rtxn RoTxn,
-        prefixes: &'a HashSet<Prefix>,
+        prefixes: &'a BTreeSet<Prefix>,
    ) -> heed::Result<Self> {
        let database = database.remap_data_type::<Bytes>();

@@ -173,8 +173,8 @@ impl WordPrefixIntegerDocids {
    fn execute(
        self,
        wtxn: &mut heed::RwTxn,
-        prefix_to_compute: &HashSet<Prefix>,
-        prefix_to_delete: &HashSet<Prefix>,
+        prefix_to_compute: &BTreeSet<Prefix>,
+        prefix_to_delete: &BTreeSet<Prefix>,
    ) -> Result<()> {
        delete_prefixes(wtxn, &self.prefix_database, prefix_to_delete)?;
        self.recompute_modified_prefixes(wtxn, prefix_to_compute)
@@ -184,7 +184,7 @@ impl WordPrefixIntegerDocids {
    fn recompute_modified_prefixes(
        &self,
        wtxn: &mut RwTxn,
-        prefixes: &HashSet<Prefix>,
+        prefixes: &BTreeSet<Prefix>,
    ) -> Result<()> {
        // We fetch the docids associated to the newly added word prefix fst only.
        // And collect the CboRoaringBitmaps pointers in an HashMap.
@@ -262,7 +262,7 @@ impl<'a, 'rtxn> FrozenPrefixIntegerBitmaps<'a, 'rtxn> {
    pub fn from_prefixes(
        database: Database<Bytes, CboRoaringBitmapCodec>,
        rtxn: &'rtxn RoTxn,
-        prefixes: &'a HashSet<Prefix>,
+        prefixes: &'a BTreeSet<Prefix>,
    ) -> heed::Result<Self> {
        let database = database.remap_data_type::<Bytes>();

@@ -291,7 +291,7 @@ unsafe impl<'a, 'rtxn> Sync for FrozenPrefixIntegerBitmaps<'a, 'rtxn> {}
 fn delete_prefixes(
    wtxn: &mut RwTxn,
    prefix_database: &Database<Bytes, CboRoaringBitmapCodec>,
-    prefixes: &HashSet<Prefix>,
+    prefixes: &BTreeSet<Prefix>,
 ) -> Result<()> {
    // We remove all the entries that are no more required in this word prefix docids database.
    for prefix in prefixes {
@@ -309,8 +309,8 @@ fn delete_prefixes(
 pub fn compute_word_prefix_docids(
    wtxn: &mut RwTxn,
    index: &Index,
-    prefix_to_compute: &HashSet<Prefix>,
-    prefix_to_delete: &HashSet<Prefix>,
+    prefix_to_compute: &BTreeSet<Prefix>,
+    prefix_to_delete: &BTreeSet<Prefix>,
    grenad_parameters: GrenadParameters,
 ) -> Result<()> {
    WordPrefixDocids::new(
@@ -325,8 +325,8 @@ pub fn compute_word_prefix_docids(
 pub fn compute_exact_word_prefix_docids(
    wtxn: &mut RwTxn,
    index: &Index,
-    prefix_to_compute: &HashSet<Prefix>,
-    prefix_to_delete: &HashSet<Prefix>,
+    prefix_to_compute: &BTreeSet<Prefix>,
+    prefix_to_delete: &BTreeSet<Prefix>,
    grenad_parameters: GrenadParameters,
 ) -> Result<()> {
    WordPrefixDocids::new(
@@ -341,8 +341,8 @@ pub fn compute_exact_word_prefix_docids(
 pub fn compute_word_prefix_fid_docids(
    wtxn: &mut RwTxn,
    index: &Index,
-    prefix_to_compute: &HashSet<Prefix>,
-    prefix_to_delete: &HashSet<Prefix>,
+    prefix_to_compute: &BTreeSet<Prefix>,
+    prefix_to_delete: &BTreeSet<Prefix>,
    grenad_parameters: GrenadParameters,
 ) -> Result<()> {
    WordPrefixIntegerDocids::new(
@@ -357,8 +357,8 @@ pub fn compute_word_prefix_fid_docids(
 pub fn compute_word_prefix_position_docids(
    wtxn: &mut RwTxn,
    index: &Index,
-    prefix_to_compute: &HashSet<Prefix>,
-    prefix_to_delete: &HashSet<Prefix>,
+    prefix_to_compute: &BTreeSet<Prefix>,
+    prefix_to_delete: &BTreeSet<Prefix>,
    grenad_parameters: GrenadParameters,
 ) -> Result<()> {
    WordPrefixIntegerDocids::new(
--- a/crates/xtask/src/bench/mod.rs
+++ b/crates/xtask/src/bench/mod.rs
@@ -82,6 +82,10 @@ pub struct BenchDeriveArgs {
    /// Reason for the benchmark invocation
    #[arg(short, long)]
    reason: Option<String>,
+
+    /// The maximum time in seconds we allow for fetching the task queue before timing out.
+    #[arg(long, default_value_t = 60)]
+    tasks_queue_timeout_secs: u64,
 }

 pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
@@ -127,7 +131,7 @@ pub fn run(args: BenchDeriveArgs) -> anyhow::Result<()> {
    let meili_client = Client::new(
        Some("http://127.0.0.1:7700".into()),
        args.master_key.as_deref(),
-        Some(std::time::Duration::from_secs(60)),
+        Some(std::time::Duration::from_secs(args.tasks_queue_timeout_secs)),
    )?;

    // enter runtime
--- a/crates/xtask/src/main.rs
+++ b/crates/xtask/src/main.rs
@@ -16,6 +16,7 @@ struct ListFeaturesDeriveArgs {
 #[command(author, version, about, long_about)]
 #[command(name = "cargo xtask")]
 #[command(bin_name = "cargo xtask")]
+#[allow(clippy::large_enum_variant)] // please, that's enough...
 enum Command {
    ListFeatures(ListFeaturesDeriveArgs),
    Bench(BenchDeriveArgs),
--- a/workloads/hackernews-add-new-documents.json
+++ b/workloads/hackernews-add-new-documents.json
@@ -0,0 +1,105 @@
+{
+  "name": "hackernews.add_new_documents",
+  "run_count": 3,
+  "extra_cli_args": [],
+  "assets": {
+      "hackernews-01.ndjson": {
+        "local_location": null,
+        "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
+        "sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
+      },
+      "hackernews-02.ndjson": {
+        "local_location": null,
+        "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
+        "sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
+      },
+      "hackernews-03.ndjson": {
+        "local_location": null,
+        "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
+        "sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
+      },
+      "hackernews-04.ndjson": {
+        "local_location": null,
+        "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
+        "sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
+      },
+      "hackernews-05.ndjson": {
+        "local_location": null,
+        "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
+        "sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
+      }
+  },
+  "precommands": [
+    {
+      "route": "indexes/movies/settings",
+      "method": "PATCH",
+      "body": {
+        "inline": {
+          "displayedAttributes": [
+            "title",
+            "by",
+            "score",
+            "time",
+            "text"
+          ],
+          "searchableAttributes": [
+            "title",
+            "text"
+          ],
+          "filterableAttributes": [
+            "by",
+            "kids",
+            "parent"
+          ],
+          "sortableAttributes": [
+            "score",
+            "time"
+          ]
+        }
+      },
+      "synchronous": "WaitForTask"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-01.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-02.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-03.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-04.ndjson"
+      },
+      "synchronous": "WaitForTask"
+    }
+  ],
+  "commands": [
+      {
+        "route": "indexes/movies/documents",
+        "method": "POST",
+        "body": {
+          "asset": "hackernews-05.ndjson"
+        },
+        "synchronous": "WaitForTask"
+      }
+  ]
+}
--- a/workloads/hackernews-modify-facet-numbers.json
+++ b/workloads/hackernews-modify-facet-numbers.json
@@ -0,0 +1,111 @@
+{
+  "name": "hackernews.modify_facet_numbers",
+  "run_count": 3,
+  "extra_cli_args": [],
+  "assets": {
+    "hackernews-01.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
+      "sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
+    },
+    "hackernews-02.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
+      "sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
+    },
+    "hackernews-03.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
+      "sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
+    },
+    "hackernews-04.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
+      "sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
+    },
+    "hackernews-05.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
+      "sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
+    },
+    "hackernews-02-modified-filters.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02-modified-filters.ndjson",
+      "sha256": "7272cbfd41110d32d7fe168424a0000f07589bfe40f664652b34f4f20aaf3802"
+    }
+  },
+  "precommands": [
+    {
+      "route": "indexes/movies/settings",
+      "method": "PATCH",
+      "body": {
+        "inline": {
+          "displayedAttributes": [
+            "title",
+            "by",
+            "score",
+            "time",
+            "text"
+          ],
+          "searchableAttributes": [
+            "title",
+            "text"
+          ],
+          "filterableAttributes": [
+            "by",
+            "kids",
+            "parent"
+          ],
+          "sortableAttributes": [
+            "score",
+            "time"
+          ]
+        }
+      },
+      "synchronous": "WaitForTask"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-01.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-02.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-03.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-04.ndjson"
+      },
+      "synchronous": "WaitForTask"
+    }
+  ],
+  "commands": [
+      {
+        "route": "indexes/movies/documents",
+        "method": "POST",
+        "body": {
+          "asset": "hackernews-02-modified-filters.ndjson"
+        },
+        "synchronous": "WaitForTask"
+      }
+  ]
+}
+  
--- a/workloads/hackernews-modify-facet-strings.json
+++ b/workloads/hackernews-modify-facet-strings.json
@@ -0,0 +1,111 @@
+{
+  "name": "hackernews.modify_facet_strings",
+  "run_count": 3,
+  "extra_cli_args": [],
+  "assets": {
+    "hackernews-01.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
+      "sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
+    },
+    "hackernews-02.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
+      "sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
+    },
+    "hackernews-03.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
+      "sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
+    },
+    "hackernews-04.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
+      "sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
+    },
+    "hackernews-05.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
+      "sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
+    },
+    "hackernews-01-modified-filters.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01-modified-filters.ndjson",
+      "sha256": "b80c245ce1b1df80b9b38800f677f3bd11947ebc62716fb108269d50e796c35c"
+    }
+  },
+  "precommands": [
+    {
+      "route": "indexes/movies/settings",
+      "method": "PATCH",
+      "body": {
+        "inline": {
+          "displayedAttributes": [
+            "title",
+            "by",
+            "score",
+            "time",
+            "text"
+          ],
+          "searchableAttributes": [
+            "title",
+            "text"
+          ],
+          "filterableAttributes": [
+            "by",
+            "kids",
+            "parent"
+          ],
+          "sortableAttributes": [
+            "score",
+            "time"
+          ]
+        }
+      },
+      "synchronous": "WaitForTask"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-01.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-02.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-03.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-04.ndjson"
+      },
+      "synchronous": "WaitForTask"
+    }
+  ],
+  "commands": [
+      {
+        "route": "indexes/movies/documents",
+        "method": "POST",
+        "body": {
+          "asset": "hackernews-01-modified-filters.ndjson"
+        },
+        "synchronous": "WaitForTask"
+      }
+  ]
+}
+ 
--- a/workloads/hackernews-modify-searchables.json
+++ b/workloads/hackernews-modify-searchables.json
@@ -0,0 +1,123 @@
+{
+  "name": "hackernews.modify_searchables",
+  "run_count": 3,
+  "extra_cli_args": [],
+  "assets": {
+    "hackernews-01.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01.ndjson",
+      "sha256": "cd3627b86c064d865b6754848ed0e73ef1d8142752a25e5f0765c3a1296dd3ae"
+    },
+    "hackernews-02.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02.ndjson",
+      "sha256": "5d533b83bcf992201dace88b4d0c0be8b4df5225c6c4b763582d986844bcc23b"
+    },
+    "hackernews-03.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/03.ndjson",
+      "sha256": "f5f351a0d04a8a83643ace12cafa2b7ec8ca8cb7d46fd268e5126492a6c66f2a"
+    },
+    "hackernews-04.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/04.ndjson",
+      "sha256": "ac1915ee7ce53a6718548c255a6cc59969784b2570745dc5b739f714beda291a"
+    },
+    "hackernews-05.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/05.ndjson",
+      "sha256": "be31d5632602f798e62d1c10c83bdfda2b4deaa068477eacde05fdd247572b82"
+    },
+    "hackernews-01-modified-searchables.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/01-modified-searchables.ndjson",
+      "sha256": "e5c08710c6af70031ac7212e0ba242c72ef29c8d4e1fce66c789544641452a7c"
+    },
+    "hackernews-02-modified-searchables.ndjson": {
+      "local_location": null,
+      "remote_location": "https://milli-benchmarks.fra1.digitaloceanspaces.com/bench/datasets/hackernews/modification/02-modified-searchables.ndjson",
+      "sha256": "098b029851117087b1e26ccb7ac408eda9bba54c3008213a2880d6fab607346e"
+    }
+  },
+  "precommands": [
+    {
+      "route": "indexes/movies/settings",
+      "method": "PATCH",
+      "body": {
+        "inline": {
+          "displayedAttributes": [
+            "title",
+            "by",
+            "score",
+            "time",
+            "text"
+          ],
+          "searchableAttributes": [
+            "title",
+            "text"
+          ],
+          "filterableAttributes": [
+            "by",
+            "kids",
+            "parent"
+          ],
+          "sortableAttributes": [
+            "score",
+            "time"
+          ]
+        }
+      },
+      "synchronous": "WaitForTask"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-01.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-02.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-03.ndjson"
+      },
+      "synchronous": "WaitForResponse"
+    },
+    {
+      "route": "indexes/movies/documents",
+      "method": "POST",
+      "body": {
+        "asset": "hackernews-04.ndjson"
+      },
+      "synchronous": "WaitForTask"
+    }
+  ],
+  "commands": [
+      {
+        "route": "indexes/movies/documents",
+        "method": "POST",
+        "body": {
+          "asset": "hackernews-01-modified-searchables.ndjson"
+        },
+        "synchronous": "WaitForTask"
+      },
+      {
+        "route": "indexes/movies/documents",
+        "method": "POST",
+        "body": {
+          "asset": "hackernews-02-modified-searchables.ndjson"
+        },
+        "synchronous": "WaitForTask"
+      }
+  ]
+}
Author	SHA1	Message	Date
Kerollmops	63b6229984	Log when an entry is too large to fit in the BBQueue	2024-12-05 11:24:16 +01:00
meili-bors[bot]	cac355bfa7	Merge #5124 5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache: - Optimize the prefix generation for word position docids (`@manythefish)` - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)` ## Benchmarks on 1cpu 2gb gpo3 (5k IOps) Before on the tag meilisearch-v1.12.0-rc.3. ``` word_position_docids:merge_and_send_docids: 988s compute_word_fst: 23.3s word_pair_proximity_docids:merge_and_send_docids: 428s compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s ``` After sorting the whole `HashMap`s in a `Vec` on this branch. ``` word_position_docids:merge_and_send_docids: 202s compute_word_fst: 20.4s word_pair_proximity_docids:merge_and_send_docids: 427s compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s ``` Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2024-12-05 09:35:52 +00:00
Kerollmops	52843123d4	Clean up and remove the non-sorted merge_caches function	2024-12-05 10:03:05 +01:00
meili-bors[bot]	6298db5bea	Merge #5113 5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-12-05 09:01:02 +00:00
meili-bors[bot]	a003a0934a	Merge #5121 Some checks failed Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run Test suite / Tests almost all features (push) Has been skipped Test suite / Test disabled tokenization (push) Has been skipped Test suite / Tests on ubuntu-20.04 (push) Failing after 11s Test suite / Run tests in debug (push) Failing after 9s Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 24s Test suite / Run Rustfmt (push) Successful in 1m19s Test suite / Run Clippy (push) Successful in 5m32s 5121: Make the tasks pulling timeout configurable r=dureuill a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2024-12-04 17:04:14 +00:00
Louis Dureuil	3a11e39c01	Force max_memory to a min of 100MiB	2024-12-04 17:53:30 +01:00
Louis Dureuil	5f896b1050	Fix geo when spilling	2024-12-04 17:51:12 +01:00
Kerollmops	d0c4e6da6b	Make clippy happy	2024-12-04 17:39:10 +01:00
Kerollmops	2da5584bb5	Make the tasks pulling timeout configurable	2024-12-04 17:39:07 +01:00
meili-bors[bot]	b7eb802ae6	Merge #5120 5120: Add cross tasks r=Kerollmops a=ManyTheFish Add 4 xtask bench workloads: - `hackernews-add-new-documents`: adds new documents on a db already containing documents - `hackernews-modify-facet-numbers`: modify filterable fields containing numbers of documents on a db already containing documents - `hackernews-modify-facet-strings`: modify filterable fields containing strings of documents on a db already containing documents - `hackernews-modify-searchables`: modify searchable fields of documents on a db already containing documents Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-12-04 16:16:57 +00:00
Kerollmops	2e32d0474c	Lexicographically sort all the map to merge	2024-12-04 17:05:11 +01:00
Kerollmops	cb99ac6f7e	Consume vec instead of draining	2024-12-04 17:00:22 +01:00
Kerollmops	be411435f5	Use the merge_caches_alt function in the docids merging	2024-12-04 16:37:29 +01:00
Kerollmops	29ef164530	Introduce a new semi ordered merge function	2024-12-04 16:33:35 +01:00
ManyTheFish	739c52a3cd	Replace HashSets by BTreeSets for the prefixes	2024-12-04 16:16:48 +01:00
ManyTheFish	8388698993	Fix dat hash	2024-12-04 15:09:10 +01:00
ManyTheFish	7458f0386c	fix asset name	2024-12-04 14:44:57 +01:00
meili-bors[bot]	3ded069042	Merge #5122 5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops We prefer yielding to let the writing thread do its job instead of spin looping. Co-authored-by: Kerollmops <clement@meilisearch.com>	2024-12-04 13:33:51 +00:00
Kerollmops	261d2ceb06	Yield the BBQueue writer instead of spin looping	2024-12-04 14:16:40 +01:00
ManyTheFish	1a17e2e572	fix formating	2024-12-04 13:57:06 +01:00
meili-bors[bot]	5b8cd68abe	Merge #5110 5110: Increase margin on deletion of task r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/5077 ## What does this PR do? - Increase the margin we keep to enqueue task deletion The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued. We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb. Co-authored-by: Tamo <tamo@meilisearch.com>	2024-12-04 12:54:48 +00:00
ManyTheFish	5ce9acb0b9	Add workloads	2024-12-04 12:19:19 +01:00
meili-bors[bot]	54341c2e80	Merge #5118 5118: Change the reserve and grant function to accept a closure r=ManyTheFish a=Kerollmops This simplifies the usage of the grant and commits it at the right time, just after having written in it. Co-authored-by: Kerollmops <clement@meilisearch.com>	2024-12-04 10:12:39 +00:00
Kerollmops	96831ed9bb	Send the WakeUp message if necessary in the reserve function	2024-12-04 11:03:01 +01:00
Kerollmops	0459b1a242	Change the reserve and grant function to accept a closure	2024-12-04 10:32:25 +01:00
Kerollmops	8ecb726683	Fix the minimun BBQueue channel threshold	2024-12-03 15:49:11 +01:00
meili-bors[bot]	297e72e262	Merge #5111 Some checks failed Test suite / Tests on ${{ matrix.os }} (macos-13) (push) Waiting to run Test suite / Tests on ${{ matrix.os }} (windows-2022) (push) Failing after 43s Test suite / Tests on ubuntu-20.04 (push) Failing after 11s Test suite / Tests almost all features (push) Has been skipped Test suite / Test disabled tokenization (push) Has been skipped Test suite / Run tests in debug (push) Failing after 9s Test suite / Run Clippy (push) Successful in 7m18s Test suite / Run Rustfmt (push) Successful in 1m32s 5111: Update BBQueue repo to point to the Meilisearch org r=curquiza a=Kerollmops This PR updates the milli dependencies to make BBQueue point to the Meilisearch org repo. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-12-03 14:27:04 +00:00
Clément Renault	0ad2f57a92	Update bbqueue repo to point to the meilisearch org	2024-12-03 12:00:04 +01:00
Tamo	71d53f413f	increase the margin allowed to delete task	2024-12-03 11:07:03 +01:00