meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-10-29 15:06:42 +00:00

Author	SHA1	Message	Date
bors[bot]	8efac33b53	Merge #467 467: optimize prefix database r=Kerollmops a=MarinPostma This pr introduces two optimizations that greatly improve the speed of computing prefix databases. - The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST. - We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:14:35 +00:00
ad hoc	d127c57f2d	review edits	2022-03-15 17:12:48 +01:00
ad hoc	d633ac5b9d	optimize word prefix pair	2022-03-15 16:37:22 +01:00
ad hoc	d68fe2b3c7	optimize word prefix fst	2022-03-15 16:36:48 +01:00
bors[bot]	d87e8b63a9	Merge #465 465: Update dependencies r=ManyTheFish a=Kerollmops This PR upgrade and updates this crate's dependencies but first, it removes three dependencies that we don't use anymore. I used [cargo udeps](https://github.com/est31/cargo-udeps) to upgrade them ⬆️ Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-03-15 13:49:17 +00:00
Clément Renault	0c5f4ed7de	Apply suggestions Co-authored-by: Many <many@meilisearch.com>	2022-03-15 14:18:29 +01:00
Kerollmops	21ec334dcc	Fix the compilation error of the dependency versions	2022-03-15 11:17:45 +01:00
Kerollmops	63682c2c9a	Upgrade the dependencies	2022-03-15 11:17:44 +01:00
Kerollmops	288a879411	Remove three useless dependencies	2022-03-15 11:17:44 +01:00
bors[bot]	712bf035a7	Merge #464 464: exporting heed to avoid having different versions of Heed in Meilisearch r=curquiza a=psvnlsaikumar # Pull Request ## What does this PR do? Fixes the issue in meilisearch https://github.com/meilisearch/meilisearch/issues/2210 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: psvnl sai kumar <psvnlsaikumar@gmail.com>	2022-03-15 09:51:56 +00:00
psvnl sai kumar	5e08fac729	fixes for rustfmt pass	2022-03-14 19:22:41 +05:30
psvnl sai kumar	92e2e09434	exporting heed to avoid having different versions of Heed in Meilisearch	2022-03-14 01:01:58 +05:30
bors[bot]	290a29b5fb	Merge #457 457: Avoid iterating on big databases when useless r=Kerollmops a=Kerollmops This PR makes the prefix database updates to avoid iterating on big grenad files when it is unnecessary. We introduced this regression in #436 but it went unnoticed. --- According to the following benchmark results, we take more time when we index documents in one run than before #436. It looks like it is probably due to the fact that, now, instead of computing the prefixes database by iterating on the LMDB we directly iterate on the grenad file. Those could be slower to iterate on and could be the slowdown cause. I just pushed a commit that tests this branch with the new unreleased version of grenad where some work was done to speed up the iteration on grenad files. [The benchmarks for this last commit](https://github.com/meilisearch/milli/actions/runs/1927187408) are currently running. You can [see the diff](https://github.com/meilisearch/grenad/compare/v0.4.1...main) between the v0.4 and the unreleased v0.5 version of grenad. ```diff group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_stop-iterating-on-big-grenad-files_ac8b85c4 ----- ---------------------------------------------------------------- ---------------------------------------------------- + indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.3±2.76s ? ?/sec - indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.10 1008.4±38.27s ? ?/sec + indexing/Indexing wiki in three batches 1.10 1091.2±32.73s ? ?/sec 1.00 995.5±24.33s ? ?/sec ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-09 16:46:34 +00:00
Kerollmops	1ae13c1374	Avoid iterating on big databases when useless	2022-03-09 15:43:54 +01:00
bors[bot]	a8d28e364d	Merge #461 461: Add a new error message when the `valid_fields` is empty r=curquiza a=brunoocasali I've created a test case to handle the new error formatting behavior, but I'm not sure if: - this is the right place to add the test? - this is the best way to test this behavior? And I'm not sure also regarding the `match` implementation, is this something required? Or maybe just an `if` statement is ok as well? I left the two messages literally without "reusing the prefix" in the implementation because I think this could help the "searchability" of the error in the future. # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/meilisearch/issues/2140 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [ ] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Bruno Casali <brunoocasali@gmail.com>	2022-03-08 09:55:58 +00:00
bors[bot]	2ef5751795	Merge #463 463: Allow setting the primary-key in the cli r=irevoire a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-03-07 14:11:40 +00:00
Tamo	8bb45956d4	allow to set the primary key in the cli	2022-03-07 14:56:49 +01:00
bors[bot]	3cbadf92b6	Merge #462 462: cli improvements r=Kerollmops a=MarinPostma a few improvements: - use bufreader to load documents, so the loading of the document doesn't appear on flamegraphs - set default db path to current directory so the `-i` flag can be omitted. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-07 09:39:01 +00:00
ad hoc	db3a1905de	default db path	2022-03-07 10:30:47 +01:00
ad hoc	6cf82ba993	bufread documents	2022-03-07 10:29:52 +01:00
Bruno Casali	66c6d5e1ef	Add a new error message when the `valid_fields` is empty > "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes." > "Attribute `{}` is not sortable. Available sortable attributes are: `{}`." coexist in the error handling	2022-03-05 10:38:18 -03:00
bors[bot]	df518d8b0b	Merge #459 459: Update heed link in cargo toml r=Kerollmops a=curquiza Since grenad and heed have been moved to the meilisearch orga, this PR changes the link. This is a minor change since GitHub handles automatically the redirection. This PR is only for consisitency. Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-03-01 19:47:14 +00:00
Clémentine Urquizar	d9ed9de2b0	Update heed link in cargo toml	2022-03-01 19:45:29 +01:00
bors[bot]	51cf44d6fd	Merge #456 456: Remove useless grenad merging r=Kerollmops a=Kerollmops This PR must be merged after #454. This PR removes the part of code that was merging all of the grenad Readers merging that we don't need as the indexer should have merged them and, therefore, we should only have one final grenad Reader. We reduce the amount of CPU usage and memory pressure we were doing uselessly. `@ManyTheFish` are you sure I can skip merging the `word_docids` database? Here is the benchmark comparison with the previously merged PR #454: ``` group indexing_reintroduce-appending-sorted-values_c05e42a8 indexing_remove-useless-grenad-merging_d5b8b5a2 ----- ----------------------------------------------------- ----------------------------------------------- indexing/Indexing movies with default settings 1.06 16.6±1.04s ? ?/sec 1.00 15.7±0.93s ? ?/sec indexing/Indexing songs with default settings 1.16 60.1±7.07s ? ?/sec 1.00 51.7±5.98s ? ?/sec indexing/Indexing songs without faceted numbers 1.06 55.4±6.14s ? ?/sec 1.00 52.2±4.13s ? ?/sec ``` And the comparison with multi-batch indexing before #436, we can see that we gain time for benchmarks that index datasets in multiple batches but there is _so much_ variance that it's not clear. ``` group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_remove-useless-grenad-merging_d5b8b5a2 ----- ---------------------------------------------------------------- ----------------------------------------------- indexing/Indexing geo_point 1.07 6.6±0.08s ? ?/sec 1.00 6.2±0.11s ? ?/sec indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.5±3.80s ? ?/sec indexing/Indexing songs with default settings 1.00 47.5±2.52s ? ?/sec 1.09 51.7±5.98s ? ?/sec indexing/Indexing songs without any facets 1.00 43.5±1.43s ? ?/sec 1.12 48.8±3.73s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 47.1±2.23s ? ?/sec 1.11 52.2±4.13s ? ?/sec indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.09 998.7±38.92s ? ?/sec indexing/Indexing wiki in three batches 1.09 1091.2±32.73s ? ?/sec 1.00 996.5±15.70s ? ?/sec ``` What do you think `@irevoire?` Should we change the benchmarks to make them do more runs? Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-01 16:48:08 +00:00
Kerollmops	d5b8b5a2f8	Replace the ugly unwraps by clean if let Somes	2022-02-28 16:31:33 +01:00
Kerollmops	8d26f3040c	Remove a useless grenad file merging	2022-02-28 16:31:33 +01:00
bors[bot]	21898ffc60	Merge #454 454: Reintroduce appending sorted entries when possible r=Kerollmops a=Kerollmops This PR modifies the `sorter_into_lmdb_database` function to append values into the database instead of get-put-merging them, it should improve the indexation speed for when the database is empty. ```txt group indexing_main_25123af3 indexing_reintroduce-appending-sorted-values_c05e42a8 ----- ---------------------- ----------------------------------------------------- indexing/Indexing movies with default settings 1.07 17.8±0.99s ? ?/sec 1.00 16.6±1.04s ? ?/sec indexing/Indexing songs with default settings 1.00 57.0±6.01s ? ?/sec 1.05 60.1±7.07s ? ?/sec indexing/Indexing songs without any facets 1.10 51.8±5.36s ? ?/sec 1.00 47.3±3.30s ? ?/sec ``` Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-02-28 14:55:37 +00:00
Clément Renault	04b1bbf932	Reintroduce appending sorted entries when possible	2022-02-24 14:50:45 +01:00
bors[bot]	382be56d36	Merge #453 453: Benchmark multi batch indexing r=Kerollmops a=Kerollmops Hey `@irevoire,` could you please add the new benchmarks into influx? Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-02-24 12:33:13 +00:00
Clément Renault	acfc96525c	Apply GitHub suggestions	2022-02-23 16:20:29 +01:00
Clément Renault	a820aa11e6	Add a new movies benchmark to test multi batch indexing	2022-02-23 16:20:29 +01:00
Kerollmops	8d2e3e4aba	Add a new wiki benchmark to test multi batch indexing	2022-02-23 16:20:29 +01:00
Kerollmops	ab5247dc64	Add a new songs benchmark to test multi batch indexing	2022-02-23 16:20:28 +01:00
bors[bot]	acd9535588	Merge #455 455: Raise the GitHub CI timeout limit to 72h r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-23 14:33:31 +00:00
Kerollmops	19bfb2649b	Raise the GitHub CI timeout limit to 72h	2022-02-23 15:27:51 +01:00
bors[bot]	25123af3b8	Merge #436 436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops This PR depends on the fixes done in #431 and must be merged after it. In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents. --- The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity. The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-16 15:41:14 +00:00
Clément Renault	ff8d7a810d	Change the behavior of the as_cloneable_grenad by taking a ref	2022-02-16 15:40:08 +01:00
Clément Renault	f367cc2e75	Finally bump grenad to v0.4.1	2022-02-16 15:28:48 +01:00
bors[bot]	f2984f66e6	Merge #452 452: bump milli r=curquiza a=irevoire Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-02-16 13:49:14 +00:00
Irevoire	0defeb268c	bump milli	2022-02-16 13:27:41 +01:00
bors[bot]	030064da25	Merge #451 451: Update LICENSE with Meili SAS name r=Kerollmops a=curquiza Check with thomas, we must put the real name of the company Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-02-15 16:18:47 +00:00
Clémentine Urquizar - curqui	84035a27f5	Update LICENSE	2022-02-15 15:52:50 +01:00
bors[bot]	0885fcf973	Merge #450 450: Get rid of chrono in favor of time r=Kerollmops a=irevoire We only use `chrono` as a wrapper around `time`, and since there has been an [open CVE on `chrono` for at least 3 months now](https://github.com/chronotope/chrono/pull/632) and the repo seems to be [struggling with maintenance](https://github.com/chronotope/chrono/pull/639), I think we should use `time` directly which is way more active and sufficient for our use case. EDIT: Actually the CVE status has been known for more than 6 months: https://github.com/chronotope/chrono/issues/602 Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-02-15 10:54:46 +00:00
Irevoire	48542ac8fd	get rid of chrono in favor of time	2022-02-15 11:41:55 +01:00
bors[bot]	ea15ad6c34	Merge #447 447: Update version for the next release (v0.22.1) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-02-07 17:44:09 +00:00
Clémentine Urquizar	d03b3ceb58	Update version for the next release (v0.22.1)	2022-02-07 18:39:29 +01:00
bors[bot]	5d58cb7449	Merge #442 442: fix phrase search r=curquiza a=MarinPostma Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-07 16:18:20 +00:00
bors[bot]	c5a996aa78	Merge #446 446: Update LICENSE r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-02-07 09:47:39 +00:00
Clémentine Urquizar - curqui	1279c38ac9	Update LICENSE	2022-02-05 18:29:11 +01:00
bors[bot]	267d14c28d	Merge #445 445: allow null values in csv r=Kerollmops a=MarinPostma This pr allows null values in csv: - if the field is of type string, then an empty field is considered null (`,,`), anything other is turned into a string (i.e `, ,` is a single whitespace string) - if the field is of type number, when the trimmed field is empty, we consider the value null (i.e `,,`, `, ,` are both null numbers) otherwise we try to parse the number. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-03 15:11:32 +00:00

1 2 3 4 5 ...

1552 Commits