Commit Graph

102 Commits

Author SHA1 Message Date
Clément Renault
ab2c83f868 Use the disk less when computing prefixes 2024-11-21 10:45:37 +01:00
Louis Dureuil
6e6acfcf1b Merge branch 'main' into indexer-edition-2024 2024-11-20 16:59:58 +01:00
Louis Dureuil
e0864f1b21 Separate side effect and debug asserts 2024-11-20 16:25:17 +01:00
Clément Renault
a38344acb3 Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
Louis Dureuil
ff9c92c409 rename documents -> substep 2024-11-20 15:12:02 +01:00
Clément Renault
8380ddbdcd Fix progress of into_changes 2024-11-20 15:10:09 +01:00
Louis Dureuil
867138f166 Add SP to into_changes 2024-11-20 15:07:05 +01:00
Clément Renault
567bd4538b Fxi the into_changes stop processing 2024-11-20 14:58:25 +01:00
Louis Dureuil
84600a10d1 Add MSP to document_update.into_changes() 2024-11-20 14:53:37 +01:00
Louis Dureuil
7d64e8dbd3 Fix Windows compilation 2024-11-20 14:40:38 +01:00
Louis Dureuil
cae8c89467 "fix" last warnings 2024-11-20 14:03:52 +01:00
Clément Renault
7cb8732b45 Introduce a new bincode internal error 2024-11-20 13:23:11 +01:00
ManyTheFish
fe5d50969a Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0 Fix comments 2024-11-20 13:16:44 +01:00
Louis Dureuil
4cdfdddd6d Fix one more 2024-11-20 13:16:43 +01:00
Louis Dureuil
2afa33011a Fix tokenize_document 2024-11-20 13:16:43 +01:00
Louis Dureuil
61feca1f41 More tests pass 2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9 facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00
Louis Dureuil
477077bdc2 Remove _vectors from fid map when there are no vectors in sight 2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348 Fix index_documents_check_exists_database 2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3 Use tokenizer on numbers and booleans 2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b Add depth to facet extraction so that null inside an array doesn't mark the entire field as null 2024-11-20 13:16:40 +01:00
Clément Renault
50d1bd01df We no longer index geo lat and lng 2024-11-20 13:16:40 +01:00
Louis Dureuil
a28d4f5d0c Fix setup_search_index_with_criteria 2024-11-20 13:16:40 +01:00
Louis Dureuil
fc14f4bc66 Attempt to fix setup_search_index_with_criteria 2024-11-20 13:16:39 +01:00
Clément Renault
5f8a82d6f5 Improve test 2024-11-20 13:16:39 +01:00
Clément Renault
fe04e51a49 One more 2024-11-20 13:16:38 +01:00
Clément Renault
01b27e40ad Fix a bit of the placeholder search tests 2024-11-20 13:16:38 +01:00
Louis Dureuil
8076d98544 Fix stats_should_not_return_deleted_documents 2024-11-20 13:16:37 +01:00
Louis Dureuil
9e951baad5 One more test passing 2024-11-20 13:16:37 +01:00
Louis Dureuil
52f2fc4c46 Fail in case of user error in tests 2024-11-20 13:16:37 +01:00
Clément Renault
3957917e0b Correctly count indexed documents 2024-11-20 13:16:36 +01:00
Louis Dureuil
651c30899e Allow fetching embedders from inside tests 2024-11-20 13:16:36 +01:00
Clément Renault
2c7a7fe4e8 Count the number of documents correctly 2024-11-20 13:16:35 +01:00
Clément Renault
23f0c2c29b Generate internal ids only when needed 2024-11-20 13:16:35 +01:00
Louis Dureuil
6641c3f59b Remove all autogenerated tests 2024-11-20 13:16:34 +01:00
Louis Dureuil
07a72824b7 Subfields of _vectors are no longer part of the fid map 2024-11-20 13:16:34 +01:00
Louis Dureuil
000eb55c4e fix one 2024-11-20 13:16:34 +01:00
Louis Dureuil
1aef0e4037 documents! macro accepts a single object again 2024-11-20 13:16:33 +01:00
Clément Renault
32d0e50a75 Fix all the benchmark compilation errors 2024-11-20 13:16:32 +01:00
Louis Dureuil
df5884b0c1 Fix settings test 2024-11-20 13:16:32 +01:00
Louis Dureuil
9e0eb5ebb0 Removed some warnings 2024-11-20 13:16:32 +01:00
Clément Renault
3cf1352ae1 Fix the benchmark tests 2024-11-20 13:16:31 +01:00
Clément Renault
aba8a0e9e0 Fix some tests but not all of them 2024-11-20 13:16:31 +01:00
Clément Renault
670aff5553 Remove useless Transform methods 2024-11-20 13:16:08 +01:00
Tamo
229fa0f902 implements the batch details 2024-11-20 10:51:06 +01:00
Lukas Kalbertodt
057fcb3993 Add indices field to _matchesPosition to specify where in an array a match comes from (#5005)
Some checks are pending
Indexing bench (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of indexing (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for geo (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for songs (push) / Run and upload benchmarks (push) Waiting to run
Benchmarks of search for Wikipedia articles (push) / Run and upload benchmarks (push) Waiting to run
Run the indexing fuzzer / Setup the action (push) Successful in 1h4m31s
* Remove unreachable code

* Add `indices` field to `MatchBounds`

For matches inside arrays, this field holds the indices of the array
elements that matched. For example, searching for `cat` inside
`{ "a": ["dog", "cat", "fox"] }` would return `indices: [1]`. For nested
arrays, this contains multiple indices, starting with the one for the
top-most array. For matches in fields without arrays, `indices` is not
serialized (does not exist) to save space.
2024-11-20 01:00:43 +01:00
ManyTheFish
41dbdd2d18 Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset 2024-11-19 16:08:25 +01:00