Refactor Settings Indexing process

**Changes:**
The transform structure is now relying on FieldIdMapWithMetadata and AttributePatterns to prepare
the obkv documents during a settings reindexing.
The InnerIndexSettingsDiff and InnerIndexSettings structs are now relying on FieldIdMapWithMetadata, FilterableAttributesRule and AttributePatterns to define the field and the databases that should be reindexed.
The faceted_fields_ids, localized_searchable_fields_ids and localized_faceted_fields_ids have been removed in favor of the FieldIdMapWithMetadata.
We are now relying on the FieldIdMapWithMetadata to retain vectors_fids from the facets and the searchables.

The searchable database computing is now relying on the FieldIdMapWithMetadata to know if a field is searchable and retrieve the locales.

The facet database computing is now relying on the FieldIdMapWithMetadata to compute the facet databases, the facet-search and retrieve the locales.

The facet level database computing is now relying on the FieldIdMapWithMetadata and the facet level database are cleared depending on the settings differences (clear_facet_levels_based_on_settings_diff).

The vector point extraction uses the FieldIdMapWithMetadata instead of FieldsIdsMapWithMetadata.

**Impact:**
- Dump import
- Settings update
This commit is contained in:
ManyTheFish
2025-03-03 10:32:02 +01:00
parent 286d310287
commit 659855c88e
12 changed files with 375 additions and 272 deletions

View File

@ -6,7 +6,7 @@ use heed::types::Bytes;
use heed::{BytesDecode, BytesEncode, Error, PutFlags, RoTxn, RwTxn};
use roaring::RoaringBitmap;
use super::{FACET_GROUP_SIZE, FACET_MIN_LEVEL_SIZE};
use super::{clear_facet_levels, FACET_GROUP_SIZE, FACET_MIN_LEVEL_SIZE};
use crate::facet::FacetType;
use crate::heed_codec::facet::{
FacetGroupKey, FacetGroupKeyCodec, FacetGroupValue, FacetGroupValueCodec,
@ -97,9 +97,7 @@ pub(crate) struct FacetsUpdateBulkInner<R: std::io::Read + std::io::Seek> {
impl<R: std::io::Read + std::io::Seek> FacetsUpdateBulkInner<R> {
pub fn update(mut self, wtxn: &mut RwTxn<'_>, field_ids: &[u16]) -> Result<()> {
self.update_level0(wtxn)?;
for &field_id in field_ids.iter() {
self.clear_levels(wtxn, field_id)?;
}
clear_facet_levels(wtxn, &self.db.remap_data_type(), field_ids)?;
for &field_id in field_ids.iter() {
let level_readers = self.compute_levels_for_field_id(field_id, wtxn)?;
@ -114,14 +112,6 @@ impl<R: std::io::Read + std::io::Seek> FacetsUpdateBulkInner<R> {
Ok(())
}
fn clear_levels(&self, wtxn: &mut heed::RwTxn<'_>, field_id: FieldId) -> Result<()> {
let left = FacetGroupKey::<&[u8]> { field_id, level: 1, left_bound: &[] };
let right = FacetGroupKey::<&[u8]> { field_id, level: u8::MAX, left_bound: &[] };
let range = left..=right;
self.db.delete_range(wtxn, &range).map(drop)?;
Ok(())
}
fn update_level0(&mut self, wtxn: &mut RwTxn<'_>) -> Result<()> {
let delta_data = match self.delta_data.take() {
Some(x) => x,
@ -365,8 +355,6 @@ impl<R: std::io::Read + std::io::Seek> FacetsUpdateBulkInner<R> {
mod tests {
use std::iter::once;
use big_s::S;
use maplit::hashset;
use roaring::RoaringBitmap;
use crate::documents::mmap_from_objects;
@ -374,7 +362,7 @@ mod tests {
use crate::heed_codec::StrRefCodec;
use crate::index::tests::TempIndex;
use crate::update::facet::test_helpers::{ordered_string, FacetIndex};
use crate::{db_snap, milli_snap};
use crate::{db_snap, milli_snap, FilterableAttributesRule};
#[test]
fn insert() {
@ -474,7 +462,8 @@ mod tests {
index
.update_settings(|settings| {
settings.set_primary_key("id".to_owned());
settings.set_filterable_fields(hashset! { S("id") });
settings
.set_filterable_fields(vec![FilterableAttributesRule::Field("id".to_string())]);
})
.unwrap();