Commit Graph

2595 Commits

Author SHA1 Message Date
b6b73fe41c Update milli/src/update/settings.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:14 +02:00
6dde41cc46 stop using a local version of arroy and instead point to the git repo with the rev 2024-09-19 15:25:38 +02:00
163f8023a1 remove debug println 2024-09-19 12:13:25 +02:00
633537ccd7 fix updating documents without updating the settings 2024-09-19 12:00:58 +02:00
3f6301dbc9 fix the missing embedder name in the error message when trying to disable the binary quantization 2024-09-19 12:00:58 +02:00
2b6952eda1 rename the ArroyReader to an ArroyWrapper since it can read and write 2024-09-19 12:00:58 +02:00
79f29eed3c fix the tests and the arroy_readers method 2024-09-19 12:00:58 +02:00
cc45e264ca implement the binary quantization in meilisearch 2024-09-19 12:00:56 +02:00
5f474a640d Merge #4938
4938: Remove default embedder r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #4738 

## What does this PR do?

[See public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#1044b06b651f80edb9d4ef6dc367bad0)

- Remove `hybrid.embedder` boolean from analytics because embedder is now mandatory and so the boolean would always be `true`
- Rework search kind so that a search without query but with vector is a vector search regardless of (non-zero) semantic ratio


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 09:17:14 +00:00
bbaee3dbc6 Add Swedish pipeline in all-tokenization feature 2024-09-19 08:34:51 +02:00
ff523a2357 Merge #4939
4939: Introduce the `STARTS WITH` filter operator r=irevoire a=Kerollmops

This PR fixes #4872 by introducing the `STARTS WITH` filter operator and gating it under the _contains filter_ experimental feature along with the `CONTAINS` one. I also updated [the experimental feature discussion page](https://github.com/orgs/meilisearch/discussions/763).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-09-18 10:19:48 +00:00
9f1fb4b425 Introduce the STARTS WITH filter operator gated under an experimental feature 2024-09-17 16:44:11 +02:00
3c5e363554 Remove default embedders 2024-09-17 16:30:43 +02:00
f4ab1f168e Prefer using Rc<str> than String when cloning a lot 2024-09-16 15:41:29 +02:00
1a0e962299 Replace hashmap by vectors in wpp 2024-09-16 15:01:20 +02:00
f13e076b8a Use hashmap instead of Btree in wpp extractor 2024-09-16 14:40:40 +02:00
7ba49b849e Extract and write facet databases 2024-09-16 09:35:16 +02:00
f7652186e1 WIP geo fields 2024-09-12 18:01:02 +02:00
23e14138bb facet distribution: implement Display for OrderBy 2024-09-12 17:43:50 +02:00
e44325683a Facet distribution: fix issue where truncated facet distribution would have a wrong order 2024-09-12 17:43:49 +02:00
b2f4e67c9a Do not store useless updates 2024-09-12 15:38:31 +02:00
ff5d3b59f5 Move the document id extraction to the primary key code 2024-09-12 12:01:42 +02:00
aa69308e45 Use a bufWriter to build word FSTs 2024-09-12 11:48:00 +02:00
eb9a20ff0b Fix fid_word_docids extraction 2024-09-12 11:08:18 +02:00
3e9198ebaa Support guessing primary key again 2024-09-11 17:25:40 +02:00
2a0ad0982f Fix the document counter 2024-09-11 15:59:36 +02:00
2b317c681b Build mergers in parallel 2024-09-11 11:49:26 +02:00
39b5990f64 Mutualize tokenization 2024-09-11 10:22:38 +02:00
8287c2644f Support CSV again 2024-09-10 21:10:28 +01:00
c1c44a0b81 Impl serialize on TopLevelMap 2024-09-10 19:32:03 +01:00
04596f3616 Move the TopLevelMap into a dedicated module 2024-09-10 18:01:17 +01:00
24cb5839ad Move the document changes sorting logic to a new trait 2024-09-10 17:37:52 +01:00
f69688e8f7 Fix several warnings in extractors and remove unreachable macros 2024-09-09 14:52:50 +02:00
f18e9cb7b3 Change openai default model 2024-09-09 13:09:35 +02:00
8fd0afaaaa Make sure we iterate over the payload documents in order 2024-09-06 08:09:08 +02:00
72c6a21a30 Use raw JSON to read the payloads 2024-09-05 20:08:23 +02:00
8412be4a7d Cleanup CowStr and TopLevelMap struct 2024-09-05 18:32:55 +02:00
10f09c531f add some commented code to read from json with raw values 2024-09-05 18:22:16 +02:00
8fd99b111b Add tracing timers logs 2024-09-05 18:00:22 +02:00
f6b3d1f9a5 Increase some channel sizes 2024-09-05 15:12:07 +02:00
73ce67862d Use the word pair proximity and fid word count docids extractors
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-05 10:56:22 +02:00
0fc02f7351 Move the facet extraction to dedicated modules 2024-09-05 10:32:27 +02:00
34f11e3380 Implement word count and word pair proximity extractors 2024-09-05 10:30:39 +02:00
27308eaab1 Import the facet extractors 2024-09-04 17:58:15 +02:00
b33ec9ba3f Introduce the FieldIdFacetIsNullDocidsExtractor 2024-09-04 17:50:08 +02:00
9c0a1cd9fd Introduce the FieldIdFacetExistsDocidsExtractor 2024-09-04 17:48:49 +02:00
0b061f1e70 Introduce the FieldIdFacetIsEmptyDocidsExtractor 2024-09-04 17:40:24 +02:00
19d937ab21 Introduce the facet extractors 2024-09-04 17:03:54 +02:00
1d59c19cd2 Send the WordsFst by using an Mmap 2024-09-04 14:30:09 +02:00
98e48371c3 Factorize some stuff 2024-09-04 12:17:13 +02:00