Commit Graph

10342 Commits

Author SHA1 Message Date
F. Levi
65e3d61a95 Make use of helper function in one more place 2024-09-13 13:35:58 +03:00
F. Levi
cc6a2aec06 Improve changes to Matcher 2024-09-13 13:31:07 +03:00
Clément Renault
f7652186e1 WIP geo fields 2024-09-12 18:01:02 +02:00
Louis Dureuil
23e14138bb facet distribution: implement Display for OrderBy 2024-09-12 17:43:50 +02:00
Louis Dureuil
e44325683a Facet distribution: fix issue where truncated facet distribution would have a wrong order 2024-09-12 17:43:49 +02:00
F. Levi
e7af499314 Improve changes to Matcher 2024-09-12 16:58:13 +03:00
Clément Renault
b2f4e67c9a Do not store useless updates 2024-09-12 15:38:31 +02:00
Clément Renault
ff5d3b59f5 Move the document id extraction to the primary key code 2024-09-12 12:01:42 +02:00
ManyTheFish
aa69308e45 Use a bufWriter to build word FSTs 2024-09-12 11:48:00 +02:00
ManyTheFish
eb9a20ff0b Fix fid_word_docids extraction 2024-09-12 11:08:18 +02:00
F. Levi
edcb4c60ba Change Matcher so that phrases are counted as one instead of word by word 2024-09-12 09:46:08 +03:00
Clément Renault
0d868f36d7 Make sure we always use a BufWriter to write the update files 2024-09-11 18:38:04 +02:00
Clément Renault
e7d9db078f Use the right key name when convertir from CSV to NDJSON 2024-09-11 18:27:00 +02:00
Clément Renault
3e9198ebaa Support guessing primary key again 2024-09-11 17:25:40 +02:00
Clément Renault
2a0ad0982f Fix the document counter 2024-09-11 15:59:36 +02:00
ManyTheFish
2b317c681b Build mergers in parallel 2024-09-11 11:49:26 +02:00
ManyTheFish
39b5990f64 Mutualize tokenization 2024-09-11 10:22:38 +02:00
Clément Renault
3848adf5a2 Improve error management and simplify JSON read 2024-09-11 10:10:51 +02:00
Clément Renault
b4de06259e Better CSV support 2024-09-11 10:02:00 +02:00
meili-bors[bot]
02c2b660f8 Merge #4920
4920: Change OpenAI default model r=dureuill a=dureuill

# Pull Request

## Related issue
Fixes #4856

See also [public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#b4685a48c4784262a149ec307ec58671)

## What does this PR do?
- make the `text-embedding-3-small` the default model for OpenAI instead of `text-embedding-ada-002`. Existing embedders are not impacted


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-11 07:08:39 +00:00
Clément Renault
8287c2644f Support CSV again 2024-09-10 21:10:28 +01:00
Clément Renault
c1c44a0b81 Impl serialize on TopLevelMap 2024-09-10 19:32:03 +01:00
Clément Renault
04596f3616 Move the TopLevelMap into a dedicated module 2024-09-10 18:01:17 +01:00
Clément Renault
24cb5839ad Move the document changes sorting logic to a new trait 2024-09-10 17:37:52 +01:00
Clément Renault
8d97b7b28c Support JSON payloads again (not perfectly though) 2024-09-10 17:09:49 +01:00
ManyTheFish
f69688e8f7 Fix several warnings in extractors and remove unreachable macros 2024-09-09 14:52:50 +02:00
Louis Dureuil
f18e9cb7b3 Change openai default model 2024-09-09 13:09:35 +02:00
Clément Renault
8fd0afaaaa Make sure we iterate over the payload documents in order 2024-09-06 08:09:08 +02:00
Clément Renault
72c6a21a30 Use raw JSON to read the payloads 2024-09-05 20:08:23 +02:00
Clément Renault
8412be4a7d Cleanup CowStr and TopLevelMap struct 2024-09-05 18:32:55 +02:00
Louis Dureuil
10f09c531f add some commented code to read from json with raw values 2024-09-05 18:22:16 +02:00
ManyTheFish
8fd99b111b Add tracing timers logs 2024-09-05 18:00:22 +02:00
Clément Renault
f6b3d1f9a5 Increase some channel sizes 2024-09-05 15:12:07 +02:00
meili-bors[bot]
db0cf3b2ed Merge #4912
4912: Allow Meilitool to dumplessly, offline upgrade v1.9 -> v1.10 in some conditions r=Kerollmops a=dureuill

- bail early if the DB contains at least 1 REST embedder, providing the list of detected REST embedders, and without modifying the DB
- Might depend on the feature set that meilitool was compiled with and the featureset that the Meilisearch that created the DB was compiled with 💀. In case of runtime error, try again with a different feature set (passing or not passing `-p meilitool` when building after a `cargo clean`)

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
prototype-upgrade-offline-v1_9-v1_10-0
2024-09-05 09:11:23 +00:00
Clément Renault
73ce67862d Use the word pair proximity and fid word count docids extractors
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-05 10:56:22 +02:00
Louis Dureuil
f6abf01d2c Check REST embedders before touching the DB 2024-09-05 10:49:59 +02:00
Clément Renault
0fc02f7351 Move the facet extraction to dedicated modules 2024-09-05 10:32:27 +02:00
ManyTheFish
34f11e3380 Implement word count and word pair proximity extractors 2024-09-05 10:30:39 +02:00
Louis Dureuil
28da759f11 meilitool: Support dumpless upgrade from v1.9 to v1.10 when there are no REST embedders 2024-09-05 10:08:38 +02:00
Louis Dureuil
ea96d19525 Change versioning in meili 2024-09-05 10:08:06 +02:00
Louis Dureuil
d352b1ee83 Add serde to meilitool 2024-09-05 10:07:33 +02:00
Clément Renault
27308eaab1 Import the facet extractors 2024-09-04 17:58:15 +02:00
Clément Renault
b33ec9ba3f Introduce the FieldIdFacetIsNullDocidsExtractor 2024-09-04 17:50:08 +02:00
Clément Renault
9c0a1cd9fd Introduce the FieldIdFacetExistsDocidsExtractor 2024-09-04 17:48:49 +02:00
Clément Renault
0b061f1e70 Introduce the FieldIdFacetIsEmptyDocidsExtractor 2024-09-04 17:40:24 +02:00
Clément Renault
19d937ab21 Introduce the facet extractors 2024-09-04 17:03:54 +02:00
Clément Renault
1d59c19cd2 Send the WordsFst by using an Mmap 2024-09-04 14:30:09 +02:00
Clément Renault
98e48371c3 Factorize some stuff 2024-09-04 12:17:13 +02:00
Clément Renault
6d74fb0229 Introduce the WordFidWordDocids database 2024-09-04 11:40:55 +02:00
ManyTheFish
1eb75a1040 remove milli/src/update/new/extract/tokenize_document.rs 2024-09-04 11:40:26 +02:00