Commit Graph

158 Commits

Author SHA1 Message Date
15d478cf4d Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
753e76d451 Update version for the next release (v0.33.4) in Cargo.toml files 2022-09-13 13:55:50 +00:00
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
8cd5200f48 Make charabia languages configurable 2022-09-08 12:21:43 +02:00
5e07ea79c2 Make charabia default feature optional 2022-09-07 20:54:31 +02:00
077dcd2002 Update version for the next release (v0.33.3) in Cargo.toml files 2022-09-07 15:48:53 +00:00
97a04887a3 Update version for next release (v0.33.2) in Cargo.toml 2022-09-01 11:47:23 +02:00
c3363706c5 Update version for next release (v0.33.1) in Cargo.toml 2022-08-31 11:37:27 +02:00
a79ff8a1a9 Merge #611
611: Upgrade charabia v0.6.0 r=curquiza a=ManyTheFish

# Pull Request

## What does this PR do?

- Update `log`
- Upgrade `charabia`

related to https://github.com/meilisearch/meilisearch/issues/2686


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-23 10:17:29 +00:00
9ed7324995 Update version for next release (v0.33.0) 2022-08-23 11:47:48 +02:00
ba5ca8a362 Upgrade charabia v0.6.0 2022-08-22 14:38:00 +02:00
4aae07d5f5 expose the size methods 2022-08-17 17:07:38 +02:00
e96b852107 bump heed 2022-08-17 17:05:50 +02:00
4b7fd4dfae Update insta version 2022-08-10 15:53:46 +02:00
ef889ade5d Refactor snapshot tests 2022-08-10 15:53:46 +02:00
334098a7e0 Add index snapshot test helper function 2022-08-10 15:53:46 +02:00
d5e9b7305b Update version for next release (v0.32.0) 2022-07-21 13:20:02 +04:00
eb63af1f10 Update grenad to 0.4.2 2022-07-12 14:52:55 +02:00
1bfdcfc84f Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
446439e8be bump charabia 2022-07-05 12:19:30 +02:00
cc48992e79 Bump the milli version to 0.31.1 2022-06-22 17:05:51 +02:00
f5c3b951bc Bump the milli version to 0.31.0 2022-06-22 12:08:16 +02:00
31f749b5d8 Update version for next release (v0.30.0) 2022-06-20 12:09:57 +02:00
676187ba43 bump milli version 2022-06-09 16:53:32 +02:00
56ee9cc21f Bump the version to 0.29.2 2022-06-08 16:00:06 +02:00
478dbfa45a Update version for next release (v0.29.1) 2022-06-07 18:59:33 +02:00
6ce1c6487a Update version for next release (v0.29.0) 2022-06-02 18:07:55 +02:00
192e024ada Add Charabia in Cargo.toml 2022-06-02 16:59:07 +02:00
c19c17eddb Update version to v0.28.1 2022-06-01 18:31:02 +02:00
895f5d8a26 Bump milli version 2022-05-18 10:37:12 +02:00
484a9ddb27 Simplify the error creation with thiserror and a smol friendly macro 2022-05-04 17:24:00 +02:00
d138b3c704 Update version 2022-04-25 18:43:46 +02:00
8d630a6f62 Update version for the next release (v0.26.1) 2022-04-14 11:44:06 +02:00
399fba16bb only flatten an object if it's nested 2022-04-14 11:14:08 +02:00
ee64f4a936 Use smartstring to store the external id in our hashmap
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
9ac2fd1c37 Merge #487
487: Update version (v0.26.0) r=Kerollmops a=curquiza

breaking because of #458 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-07 17:10:24 +00:00
bab898ce86 move the flatten-serde-json crate inside of milli 2022-04-07 18:20:44 +02:00
4f3ce6d9cd nested fields 2022-04-07 16:58:46 +02:00
ee1d627803 Update version (v0.26.0) 2022-04-07 15:56:10 +02:00
9eec44dd98 Update version (v0.25.0) 2022-04-05 12:06:42 +02:00
ddf78a735b Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
86dd88698d bump tokenizer 2022-03-23 14:25:58 +01:00
5dc464b9a7 rollback meilisearch-tokenizer version 2022-03-21 17:29:10 +01:00
08a06b49f0 Bump version to 0.23.1 2022-03-15 15:50:28 +01:00
63682c2c9a Upgrade the dependencies 2022-03-15 11:17:44 +01:00
288a879411 Remove three useless dependencies 2022-03-15 11:17:44 +01:00
d9ed9de2b0 Update heed link in cargo toml 2022-03-01 19:45:29 +01:00
25123af3b8 Merge #436
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops

This PR depends on the fixes done in #431 and must be merged after it.

In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.

---

The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.

The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-16 15:41:14 +00:00
f367cc2e75 Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
0defeb268c bump milli 2022-02-16 13:27:41 +01:00