2059: change indexed doc count on error r=irevoire a=MarinPostma
change `indexed_documents` and `deleted_documents` to return 0 instead of null when empty when the task has failed.
close#2053
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2056: Allow any header for CORS r=curquiza a=curquiza
Bug fix: trigger a CORS error when trying to send the `User-Agent` header via the browser
`@bidoubiwa` thanks for the bug report!
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2036: chore(ci): Enable rust_backtrace in the ci r=curquiza a=irevoire
This should help us to understand unreproducible panics that happens in the CI all the time
Co-authored-by: Tamo <tamo@meilisearch.com>
2035: Use self hosted GitHub runner r=curquiza a=curquiza
Checked with `@tpayet,` we have created a self hosted github runner to save time when pushing the docker images.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2033: Bug(FS): Consider empty pre-created directory as unexisting DB r=curquiza a=ManyTheFish
When the database directory was pre-created we were considering that DB is invalid, we are now accepting to create a database in it.
Co-authored-by: Maxime Legendre <maximelegendre@mbp-de-maxime.home>
2026: Bug(auth): Parse YMD date r=curquiza a=ManyTheFish
Use NaiveDate to parse YMD date instead of NaiveDatetime
fix#2017
Co-authored-by: Maxime Legendre <maximelegendre@mbp-de-maxime.home>
2025: Fix security index creation r=ManyTheFish a=ManyTheFish
Forbid index creation on alternates routes when the action `index.create` is not given
fix#2024
Co-authored-by: Maxime Legendre <maximelegendre@MacBook-Pro-de-Maxime.local>
2008: bug(lib): fix get dumps bad error code r=curquiza a=MarinPostma
fix bad error code being returned whet getting a dump status, and add a test
close#1994
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2006: chore(http): rename task types r=curquiza a=MarinPostma
Rename
- documentsAddition into documentAddition
- documentsPartial into documentPartial
- documentsDeletion into documentDeletion
close#1999
2007: bug(lib): ignore primary if already set on document addition r=curquiza a=MarinPostma
Ignore the primary key if it is already set on documents updates. Add a test for verify behaviour.
close#2002
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
1989: Extend API keys r=curquiza a=ManyTheFish
# Pull Request
## What does this PR do?
- Add API keys in snapshots
- Add API keys in dumps
- fix QA #1979fix#1979fix#1995fix#2001fix#2003
related to #1890
Co-authored-by: many <maxime@meilisearch.com>
- Add API keys in snapshots
- Add API keys in dumps
- Rename action indexes.add to indexes.create
- fix QA #1979fix#1979fix#1995fix#2001fix#2003
related to #1890
1982: Set fail-fast to false in publish-binaries CI r=curquiza a=curquiza
This avoids the other jobs to fail if one of the jobs fails.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1984: Support boolean for the no-analytics flag r=Kerollmops a=Kerollmops
This PR fixes an issue with the `no-analytics` flag that was ignoring the value passed to it, therefore a `no-analytics false` was just understood as a `no-analytics` and was effectively disabling the analytics instead of enabling them. I found [a closed issue about this exact behavior on the structopt repository](https://github.com/TeXitoi/structopt/issues/468) and applied it here.
I don't think we should update the documentation as it must have worked like this from the start of this project. I tested it on my machine and it is working great now. Thank you `@nicolasvienot` for this issue report.
Fixes#1983.
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
1978: Fix of `release-v0.25.0` branch into `main` r=curquiza a=curquiza
The fixes in #1976 should be on main to be taken into account by
```
curl -L https://install.meilisearch.com | sh
```
Co-authored-by: Yann Prono <yann.prono@nist.gov>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
1976: Fix download-latest.sh r=curquiza a=Mcdostone
# Pull Request
## What does this PR do?
Fixes#1975
The script was broken because `grep` matches the word **draft** in the changelog of [v0.25.0rc0](https://github.com/meilisearch/MeiliSearch/releases/tag/v0.25.0rc0)
> Misc
> Remove email address from the launch message (#1896) `@curquiza`
> Remove release drafter workflow (#1882) `@curquiza` ⚠️👀
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to MeiliSearch!
> Your product is awesome!
1977: Fix Dockerfile r=MarinPostma a=curquiza
Remove this error
<img width="830" alt="Capture d’écran 2021-12-07 à 17 00 46" src="https://user-images.githubusercontent.com/20380692/145063294-51ae2c50-2468-47e9-a891-542d824cad8e.png">
Co-authored-by: Yann Prono <yann.prono@nist.gov>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1970: Use milli reexported tokenizer r=curquiza a=ManyTheFish
Use milli reexported tokenizer instead of importing meilisearch-tokenizer dependency.
fix#1888
Co-authored-by: many <maxime@meilisearch.com>
1965: Reintroduce engine version file r=MarinPostma a=irevoire
Right now if you boot up MeiliSearch and point it to a DB directory created with a previous version of MeiliSearch the existing indexes will be deleted. This [used to be](51d7c84e73) prevented by a startup check which would compare the current engine version vs what was stored in the DB directory's version file, but this functionality seems to have been lost after a few refactorings of the code.
In order to go back to the old behavior we'll need to reintroduce the `VERSION` file that used to be present; I considered reusing the `metadata.json` file used in the dumps feature, but this seemed like the simpler and more approach. As the intent is just to restore functionality, the implementation is quite basic. I imagine that in the future we could build on this and do things like compatibility across major/minor versions and even migrating between formats.
This PR was made thanks to `@mbStavola` and is basically a port of his PR #1860 after a big refacto of the code #1796.
Closes#1840
Co-authored-by: Matt Stavola <m.freitas@offensive-security.com>
Right now if you boot up MeiliSearch and point it to a DB directory created with a previous version of MeiliSearch the existing indexes will be deleted. This used to be prevented by a startup check which would compare the current engine version vs what was stored in the DB directory's version file, but this functionality seems to have been lost after a few refactorings of the code.
In order to go back to the old behavior we'll need to reintroduce the VERSION file that used to be present; I considered reusing the metadata.json file used in the dumps feature, but this seemed like the simpler and more approach. As the intent is just to restore functionality, the implementation is quite basic. I imagine that in the future we could build on this and do things like compatibility across major/minor versions and even migrating between formats.
This PR was made thanks to @mbStavola
Closes#1840
implements:
https://github.com/meilisearch/specifications/blob/develop/text/0085-api-keys.md
- Add tests on API keys management route (meilisearch-http/tests/auth/api_keys.rs)
- Add tests checking authorizations on each meilisearch routes (meilisearch-http/tests/auth/authorization.rs)
- Implement API keys management routes (meilisearch-http/src/routes/api_key.rs)
- Create module to manage API keys and authorizations (meilisearch-auth)
- Reimplement GuardedData to extend authorizations (meilisearch-http/src/extractors/authentication/mod.rs)
- Change X-MEILI-API-KEY by Authorization Bearer (meilisearch-http/src/extractors/authentication/mod.rs)
- Change meilisearch routes to fit to the new authorization feature (meilisearch-http/src/routes/)
- close#1867
1796: Feature branch: Task store r=irevoire a=MarinPostma
# Feature branch: Task Store
## Spec todo
https://github.com/meilisearch/specifications/blob/develop/text/0060-refashion-updates-apis.md
- [x] The update resource is renamed task. The names of existing API routes are also changed to reflect this change.
- [x] Tasks are now also accessible as an independent resource of an index. GET - /tasks; GET - /tasks/:taskUid
- [x] The task uid is not incremented by index anymore. The sequence is generated globally.
- [x] A task_not_found error is introduced.
- [x] The format of the task object is updated.
- [x] updateId becomes uid.
- [x] Attributes of an error appearing in a failed task are now contained in a dedicated error object.
- [x] type is no longer an object. It now becomes a string containing the values of its name field previously defined in the type object.
- [x] The possible values for the type field are reworked to be more clear and consistent with our naming rules.
- [x] A details object is added to contain specific information related to a task payload that was previously displayed in the type nested object. Previous number key is renamed numberOfDocuments.
- [x] An indexUid field is added to give information about the related index on which the task is performed.
- [x] duration format has been updated to express an ISO 8601 duration.
- [x] processed status changes to succeeded.
- [x] startedProcessingAt is updated to startedAt.
- [x] processedAt is updated to finishedAt.
- [x] 202 Accepted requests previously returning an updateId are now returning a summarized task object.
- [x] MEILI_MAX_UDB_SIZE env var is updated MEILI_MAX_TASK_DB_SIZE.
- [x] --max-udb-size cli option is updated to --max-task-db-size.
- [x] task object lists are now returned under a results array.
- [x] Each operation on an index (creation, update, deletion) is now asynchronous and represented by a task.
## Todo tech
- [x] Restore Snapshots
- [x] Restore dumps of documents
- [x] Implements the dump of updates
- [x] Error handling
- [x] Fix stats
- [x] Restore the Analytics
- [x] [Add the new analytics](https://github.com/meilisearch/specifications/pull/92/files)
- [x] Fix tests
- [x] ~Deleting tasks when index is deleted (see bellow)~ see #1891 instead
- [x] Improve details for documents addition and deletion tasks
- [ ] Add integration test
- [ ] Test task store filtering
- [x] Rename `UuidStore` to `IndexMetaStore`, and simplify the trait.
- [x] Fix task store initialization: fill pending queue from hard state
- [x] Synchronously return error when creating an index with an invalid index_uid and add test
- [x] Task should be returned in decreasing uid + tests (on index task route)
- [x] Summarized task view
- [x] fix snapshot permissions
## Implementation
### Linked PRs
- #1889
- #1891
- #1892
- #1902
- #1906
- #1911
- #1914
- #1915
- #1916
- #1918
- #1924
- #1925
- #1926
- #1930
- #1936
- #1937
- #1942
- #1944
- #1945
- #1946
- #1947
- #1950
- #1951
- #1957
- #1959
- #1960
- #1961
- #1962
- #1964
### Linked PRs in milli:
- https://github.com/meilisearch/milli/pull/414
- https://github.com/meilisearch/milli/pull/409
- https://github.com/meilisearch/milli/pull/406
- https://github.com/meilisearch/milli/pull/418
### Issues
- close#1687
- close#1786
- close#1940
- close#1948
- close#1949
- close#1932
- close#1956
### Spec patches
- https://github.com/meilisearch/specifications/pull/90
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
1893: Make matches work with numerical value r=MarinPostma a=Thearas
# Pull Request
## What does this PR do?
Implement #1883.
I have test this PR with unit test. It appears to be working properly:

PTAL `@curquiza`
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Thearas <thearas850@gmail.com>
1896: Remove email address from the message at the launch r=irevoire a=curquiza
I suggest removing this email address from the message at the launch since it can encourage people to think this is an email address for support. Is it something we want `@meilisearch/devrel-team` since we mostly redirect them to the forum or the slack?
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1897: Add ARM image for Docker to CI r=irevoire a=curquiza
Fixes#1315
- [x] Publish MeiliSearch's docker image for `arm64`
- [x] Add `workflow_dispatch` event in case we need to re-trigger it after a failure without creating a new release
- [x] Use our own server to run the github runner since this CI is really slow (1h instead of 4h)
- [x] Open an issue for a refactor by merging both files in one file (https://github.com/meilisearch/MeiliSearch/issues/1901)
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1882: Remove release drafter r=curquiza a=curquiza
Remove release drafter since it's not used at the moment due to the specific release process of MeiliSearch.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1878: Add error object in task r=MarinPostma a=ManyTheFish
# Pull Request
## What does this PR do?
Fixes#1877
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Update error test
- [x] Remove flattening of errors during task serialization
Co-authored-by: many <maxime@meilisearch.com>
1875: Fix search post event and disk size analytics r=irevoire a=gmourier
- Branch POST search on the post_search aggregator
- Use largest disk `total_space` instead of `available_space`
1876: Update SEGMENT_API_KEY r=irevoire a=gmourier
Branch it on our Segment production stack
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
1800: Analytics r=irevoire a=irevoire
Closes#1784
Implements [this spec](https://github.com/meilisearch/specifications/blob/update-analytics-specs/text/0034-telemetry-policies.md)
# Anonymous Analytics Policy
## 1. Functional Specification
### I. Summary
This specification describes an exhaustive list of anonymous metrics collected by the MeiliSearch binary. It also describes the tools we use for this collection and how we identify a Meilisearch instance.
### II. Motivation
At MeiliSearch, our vision is to provide an easy-to-use search solution that meets the essential needs of our users. At all times, we strive to understand our users better and meet their expectations in the best possible way.
Although we can gather needs and understand our users through several channels such as Github, Slack, surveys, interviews or roadmap votes, we realize that this is not enough to have a complete view of MeiliSearch usage and features adoption. By cross-referencing our product discovery phases with aggregated quantitative data, we want to make the product much better than what it is today. Our decision-making will be taken a step further to make a product that users love.
### III. Explanation
#### General Data Protection Regulation (GDPR)
The metrics collected are non-sensitive, non-personal and do not identify an individual or a group of individuals using MeiliSearch. The data collected is secured and anonymized. We do not collect any data from the values stored in the documents.
We, the MeiliSearch team, provide an email address so that users can request the removal of their data: privacy@meilisearch.com.<br>
Thanks to the unique identifier generated for their MeiliSearch installation (`Instance uuid` when launching MeiliSearch), we can remove the corresponding data from all the tools we describe below. Any questions regarding the management of the data collected can be sent to the email address as well.
#### Tools
##### Segment
The collected data is sent to [Segment](https://segment.com/). Segment is a platform for data collection and provides data management tools.
##### Amplitude
[Amplitude](https://amplitude.com/) is a tool for graphing and highlighting collected data. Segment feeds Amplitude so that we can build visualizations according to our needs.
-----------
# The `identify` call we send every hour:
## System Configuration `system`
This property allows us to gather essential information to better understand on which type of machine MeiliSearch is used. This allows us to better advise users on the machines to choose according to their data volume and their use-cases.
- [x] `system` => Never changes but still sent every hours
- [x] distribution | On which distribution MeiliSearch is launched, eg: Arch Linux
- [x] kernel_version | On which kernel version MeiliSearch is launched, eg: 5.14.10-arch1-1
- [x] cores | How many cores does the machine have, eg: 24
- [x] ram_size | Total capacity of the machine's ram. Expressed in `Kb`, eg: 33604210
- [x] disk_size | Total capacity of the biggest disk. Expressed in `Kb`, eg: 336042103
- [x] server_provider | Users can tell us on which provider MeiliSearch is hosted by filling the `MEILI_SERVER_PROVIDER` env var. This is also filled by our providers deploy scripts. e.g. GCP [cloud-config.yaml](56a7c2630c/scripts/providers/gcp/cloud-config.yaml (L33)), eg: gcp
## MeiliSearch Configuration
- [x] `context.app.version`: MeiliSearch version, eg: 0.23.0
- [x] `env`: `production` / `development`, eg: `production`
- [x] `has_snapshot`: Does the MeiliSearch instance has snapshot activated, eg: `true`
## MeiliSearch Statistics `stats`
- [x] `stats`
- [x] `database_size`: Size of indexed data. Expressed in `Kb`, eg: 180230
- [x] `indexes_number`: Number of indexes, eg: 2
- [x] `documents_number`: Number of indexed documents, eg: 165847
- [x] `start_since_days`: How many days ago was the instance launched?, eg: 328
---------
- [x] Launched | This is the first event sent to mark that MeiliSearch is launched a first time
---------
- [x] `Documents Searched POST`: The Documents Searched event is sent once an hour. The event's properties are averaged over all search operations during that time so as not to track everything and generate unnecessary noise.
- [x] `user-agent`: Represents all the user-agents encountered on this endpoint during one hour, eg: `["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]`
- [x] `requests`
- [x] `99th_response_time`: The maximum latency, in ms, for the fastest 99% of requests, eg: `57ms`
- [x] `total_suceeded`: The total number of succeeded search requests, eg: `3456`
- [x] `total_failed`: The total number of failed search requests, eg: `24`
- [x] `total_received`: The total number of received search requests, eg: `3480`
- [x] `sort`
- [x] `with_geoPoint`: Does the built-in sort rule _geoPoint rule has been used?, eg: `true` /`false`
- [x] `avg_criteria_number`: The average number of sort criteria among all the requests containing the sort parameter. "sort": [] equals to 0 while not sending sort does not influence the average, eg: `2`
- [x] `filter`
- [x] `with_geoRadius`: Does the built-in filter rule _geoRadius has been used?, eg: `true` /`false`
- [x] `avg_criteria_number`: The average number of filter criteria among all the requests containing the filter parameter. "filter": [] equals to 0 while not sending filter does not influence the average, eg: `4`
- [x] `most_used_syntax`: The most used filter syntax among all the requests containing the requests containing the filter parameter. `string` / `array` / `mixed`, `mixed`
- [x] `q`
- [x] `avg_terms_number`: The average number of terms for the `q` parameter among all requests, eg: `5`
- [x] `pagination`:
- [x] `max_limit`: The maximum limit encountered among all requests, eg: `20`
- [x] `max_offset`: The maxium offset encountered among all requests, eg: `1000`
---
- [x] `Documents Searched GET`: The Documents Searched event is sent once an hour. The event's properties are averaged over all search operations during that time so as not to track everything and generate unnecessary noise.
- [x] `user-agent`: Represents all the user-agents encountered on this endpoint during one hour, eg: `["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]`
- [x] `requests`
- [x] `99th_response_time`: The maximum latency, in ms, for the fastest 99% of requests, eg: `57ms`
- [x] `total_suceeded`: The total number of succeeded search requests, eg: `3456`
- [x] `total_failed`: The total number of failed search requests, eg: `24`
- [x] `total_received`: The total number of received search requests, eg: `3480`
- [x] `sort`
- [x] `with_geoPoint`: Does the built-in sort rule _geoPoint rule has been used?, eg: `true` /`false`
- [x] `avg_criteria_number`: The average number of sort criteria among all the requests containing the sort parameter. "sort": [] equals to 0 while not sending sort does not influence the average, eg: `2`
- [x] `filter`
- [x] `with_geoRadius`: Does the built-in filter rule _geoRadius has been used?, eg: `true` /`false`
- [x] `avg_criteria_number`: The average number of filter criteria among all the requests containing the filter parameter. "filter": [] equals to 0 while not sending filter does not influence the average, eg: `4`
- [x] `most_used_syntax`: The most used filter syntax among all the requests containing the requests containing the filter parameter. `string` / `array` / `mixed`, `mixed`
- [x] `q`
- [x] `avg_terms_number`: The average number of terms for the `q` parameter among all requests, eg: `5`
- [x] `pagination`:
- [x] `max_limit`: The maximum limit encountered among all requests, eg: `20`
- [x] `max_offset`: The maxium offset encountered among all requests, eg: `1000`
---
- [x] `Index Created`
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id`
---
- [x] `Index Updated`
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id`
---
- [x] `Documents Added`: The Documents Added event is sent once an hour. The event's properties are averaged over all POST /documents additions operations during that time to not track everything and generate unnecessary noise.
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `payload_type`: Represents all the `payload_type` encountered on this endpoint during one hour, eg: [`text/csv`]
- [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id`
- [x] `index_creation`: Does an index creation happened, eg: `false`
---
- [x] `Documents Updated`: The Documents Added event is sent once an hour. The event's properties are averaged over all PUT /documents additions operations during that time to not track everything and generate unnecessary noise.
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `payload_type`: Represents all the `payload_type` encountered on this endpoint during one hour, eg: [`application/json`]
- [x] `primary_key`: The name of the field used as primary key if set, otherwise `null`, eg: `id`
- [x] `index_creation`: Does an index creation happened, eg: `false`
---
- [x] Settings Updated
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `ranking_rules`
- [x] `sort_position`: Position of the `sort` ranking rule if any, otherwise `null`, eg: `5`
- [x] `sortable_attributes`
- [x] `total`: Number of sortable attributes, eg: `3`
- [x] `has_geo`: Indicate if `_geo` is set as a sortable attribute, eg: `false`
- [x] `filterable_attributes`
- [x] `total`: Number of filterable attributes, eg: `3`
- [x] `has_geo`: Indicate if `_geo` is set as a filterable attribute, eg: `false`
---
- [x] `RankingRules Updated`
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `sort_position`: Position of the `sort` ranking rule if any, otherwise `null`, eg: `5`
---
- [x] `SortableAttributes Updated`
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `total`: Number of sortable attributes, eg: `3`
- [x] `has_geo`: Indicate if `_geo` is set as a sortable attribute, eg: `false`
---
- [x] `FilterableAttributes Updated`
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
- [x] `total`: Number of filterable attributes, eg: `3`
- [x] `has_geo`: Indicate if `_geo` is set as a filterable attribute, eg: `false`
---
- [x] Dump Created
- [x] `user-agent`: Represents the user-agent encountered for this API call, eg: ["MeiliSearch Ruby (2.1)", "Ruby (3.0)"]
---
Ensure the user-id file is well saved and loaded with:
- [x] the dumps
- [x] the snapshots
- [x] Ensure the CLI uuid only show if analytics are activate at launch **or already exists** (=even if meilisearch was launched without analytics)
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
1852: Add tests for mini-dashboard status and assets r=curquiza a=CuriousCorrelation
## Summery
Added tests for `mini-dashboard` status including assets.
## Ticket link
PR closes #1767
Co-authored-by: CuriousCorrelation <CuriousCorrelation@protonmail.com>
1847: Optimize document transform r=MarinPostma a=MarinPostma
integrate the optimization from https://github.com/meilisearch/milli/pull/402.
optimize payload read, by reading it to RAM first instead of streaming it. This means that the payload must fit into RAM, which should not be a problem.
Add BufWriter to the obkv writer to improve write speed.
I have measured a gain of 40-45% in speed after these optimizations.
Co-authored-by: marin postma <postma.marin@protonmail.com>
1830: Add MEILI_SERVER_PROVIDER to Dockerfile r=irevoire a=curquiza
Add docker information in `MEILI_SERVER_PROVIDER` env variable
It does not impact the telemetry spec since it's an already existing variable used on our side.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1811: Reducing ArmV8 binary build time with action-rs (cross build with Rust) r=curquiza a=patrickdung
This pull request is based on [discussion #1790](https://github.com/meilisearch/MeiliSearch/discussions/1790)
Note:
1) The binaries of this PR is additional to existing binary built
Existing binary would be produced (by existing GitHub workflow/action)
meilisearch-linux-amd64
meilisearch-linux-armv8
meilisearch-macos-amd64
meilisearch-windows-amd64.exe
meilisearch.deb
2) This PR produce these binaries. The name 'meilisearch-linux-aarch64' is used to avoid naming conflict with 'meilisearch-linux-armv8'.
meilisearch-linux-aarch64
meilisearch-linux-aarch64-musl
meilisearch-linux-aarch64-stripped
meilisearch-linux-amd64-musl
3) If it's fine (in next release), we should submit another PR to stop generating meilisearch-linux-armv8 (which could take two to three hours to build it)
Co-authored-by: Patrick Dung <38665827+patrickdung@users.noreply.github.com>
1822: Tiny improvements in download-latest.sh r=irevoire a=curquiza
- Add check on `$latest` to check if it's empty. We have some issue on the swift SDK currently where the version number seems not to be retrieved, but we don't why https://github.com/meilisearch/meilisearch-swift/pull/216
- Replace some `"` by `'`
- Rename `$BINARY_NAME` by `$binary_name` to make them consistent with the other variables that are filled all along the script
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1824: Fix indexation perfomances on mounted disk r=ManyTheFish a=ManyTheFish
We were creating all of our tempfiles in data.ms directory, but when the database directory is stored in a mounted disk, tempfiles I/O throughput decreases, impacting the indexation time.
Now, only the persisting tempfiles will be created in the database directory. Other tempfiles will stay in the default tmpdir.
Co-authored-by: many <maxime@meilisearch.com>
1813: Apply highlight tags on numbers in the formatted search result output r=irevoire a=Jhnbrn90
This is my first ever Rust related PR.
As described in #1758, I've attempted to highlighting numbers correctly under the `_formatted` key.
Additionally, I added a test which should assert appropriate highlighting.
I'm open to suggestions and improvements.
Co-authored-by: John Braun <john@brn.email>
1801: Update milli version to v0.17.3 to fix inference issue r=curquiza a=curquiza
Fixes#1798
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1793: Remove memmap dependency r=curquiza a=palfrey
Fixes#1792. I was going to replace with [memmap2](https://github.com/RazrFalcon/memmap2-rs) which should be a drop-in replacement, but I couldn't actually find anything that actually directly used it. It ends up being a dependency in [milli](https://github.com/meilisearch/milli) so I'm going to go there next and fix that.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
1783: Fix too many open file error r=ManyTheFish a=ManyTheFish
- prepare_for_closing() function wasn't called when an index is deleted, we are now calling it
- Index wasn't deleted in the case where we couldn't insert `uid` in `index_uuid_store`, we are now cleaning it
Fix#1736
Co-authored-by: many <maxime@meilisearch.com>
1781: Optimize build size r=irevoire a=MarinPostma
Remove debug symbols from the release build, and strip the binaries.
We used to need to debug symbols for sentry, but since it was removed with #1616, we don't need them anymore.
Shrinks the binary size from ~300MB to ~50MB on linux.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1769: Enforce `Content-Type` header for routes requiring a body payload r=MarinPostma a=irevoire
closes#1665
Co-authored-by: Tamo <tamo@meilisearch.com>
- remove the payload_error_handler in favor of a PayloadError::from
- merge the two match branch into one
- makes the accepted content type a const instead of recalculating it for every error
1763: Index tests r=MarinPostma a=MarinPostma
This pr aims to test more thorougly the usage on index in the meilisearch database, by writing unit tests.
work included:
- [x] Create index mock and stub methods
- [x] Test snapshot creation
- [x] Test Dumps
- [x] Test search
Co-authored-by: mpostma <postma.marin@protonmail.com>
1768: Fix auth error r=irevoire a=MarinPostma
fix a small auth error, that set the invalid token error token to "hello". This was invilisble to the user because the invalid token is not returned.
thank you hawk-eye `@irevoire`
Co-authored-by: mpostma <postma.marin@protonmail.com>
1755: Fix mini dashboard r=curquiza a=anirudhRowjee
This commit is a fix to issue #1750.
As a part of the changes to solve this issue, the following changes have
been made -
1. Route registration for static assets has been modified
2. the `mut` keyword on the `scope` has been removed.
Co-authored-by: Anirudh Rowjee <ani.rowjee@gmail.com>
This commit is a fix to issue #1750.
As a part of the changes to solve this issue, the following changes have
been made -
1. Route registration for static assets has been modified
2. the `mut` keyword on the `scope` has been removed.
1747: Add new error types for document additions r=curquiza a=MarinPostma
Adds the missing errors for the documents routes, as specified.
close#1691close#1690
Co-authored-by: mpostma <postma.marin@protonmail.com>
1746: Do not commit transaction on failed updates r=irevoire a=Kerollmops
This PR fixes MeiliSearch that was always committing the transactions even when an update was invalid and the whole transaction should have been trashed. It was the source of a bug where an invalid update (with an invalid primary key) was creating an index with the specified primary key and should instead have failed and done nothing on the server.
Fixes#1735.
Co-authored-by: Kerollmops <clement@meilisearch.com>
1742: Create dumps v3 r=irevoire a=MarinPostma
The introduction of the obkv document format has changed the format of the updates, by removing the need for the document format of the addition (it is not necessary since update are store in the obkv format). This has caused breakage in the dumps that this pr solves by introducing a 3rd version of the dumps.
A v2 compat layer has been created that support the import of v2 dumps into meilisearch. This has permitted to move the compat code that existed elsewhere in meiliearch to be moved into the v2 module. The asc/desc patching is now only done for forward compatibility when loading a v2 dump, and the v3 write the asc/desc to the dump with the new syntax.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1697: Make exec binary for M1 mac available for download r=irevoire a=k-nasa
## Why
fix: https://github.com/meilisearch/MeiliSearch/issues/1661
Now, Do not supported getting exec file for m1 mac on using`download-latest.sh`.
## What
Download x86 binary when run `download-latest.sh` on m1 mac, because it can execute binary targeting x86.
## Proof
I verified like this.
I got executable binary on M1 mac 💡
```sh
:) % arch
arm64
:) % ./download-latest.sh
Downloading MeiliSearch binary v0.21.1 for macos, architecture amd64...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 631 100 631 0 0 2035 0 --:--:-- --:--:-- --:--:-- 2035
100 43.5M 100 43.5M 0 0 7826k 0 0:00:05 0:00:05 --:--:-- 9123k
MeiliSearch binary successfully downloaded as 'meilisearch' file.
Run it:
$ ./meilisearch
Usage:
$ ./meilisearch --help
:) % file ./meilisearch
./meilisearch: Mach-O 64-bit executable x86_64
:) % ./meilisearch --help # this is execuable
meilisearch-http 0.21.1
...
...
```
Co-authored-by: k-nasa <htilcs1115@gmail.com>
Co-authored-by: nasa <htilcs1115@gmail.com>
1748: Add a link to join the cloud-hosted beta r=MarinPostma a=gmourier
The product team would like to add a link to communicate and invite users to fill out the form to test the closed beta of our cloud solution.
We have done the same thing on the documentation side https://github.com/meilisearch/documentation/pull/1148. 😇
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
1739: Fix add document Content-Type r=curquiza a=MarinPostma
change the `Content-Type` guards of the document addition routes to match the specification.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1711: MeiliSearch refactor introducing OBKV format r=MarinPostma a=MarinPostma
This PR refactor some multiple components of meilisearch, and introduce the obkv document format to meilisearch
- [x] Split meilisearch-http and meilisearch-lib
- [x] Replace `IndexActor` and `UuidResolver` with `IndexResolver`
- [x] Remove mentions to Actor
- [x] Remove Actor traits to simplify code
- [x] Integrate obkv document format
- [x] Remove `Data`
- [x] Restore all route
- [x] Replace `Box<dyn error>` with `anyhow::Error`
- [x] Introduce update file store
- [x] Update file store error handling
- [x] Fix dumps
- [x] Fix snapshots
- [x] Fix tests
- [x] Update module documentation
- [x] add csv suppport (feat `@ManyTheFish` #1729 )
- [x] add jsonl support
- [x] integrate geosearch (feat `@irevoire` #1725)
partially implements #1691 and #1690. The error handling is very basic now, I will finish it in the next pr.
Some unit tests have been disabled, I will re-enable them ASAP, but they need a bit more work.
close#1531
P.S: sorry for this monstrous PR :'(
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
1703: Trigger CodeCoverage manually instead of on each PR r=irevoire a=curquiza
Since no one is using it now on the PRs, we would rather get a state of the code coverage once (triggered manually) rather than on each PR.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1724: Redo CONTRIBUTING.md r=curquiza a=curquiza
- Update `Development` section
- Update the `Git Guidelines` section
- Remove `Benchmarking & Profiling` -> done on the milli side at the moment
- Remove `Humans` -> synchronization job done by the manager of the core team at the moment
- Remove `Changelog` section -> done by the manager and the docs team
- Remove `Documentation` section -> job done by the manager to synchronize both teams.
Fixes#1723 at the same time
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1659: deps: unify pest dependency r=MarinPostma a=happysalada
meilisearch dependends on two different versions of pest.
This can be problematic for some build systems (e.g. NixOS).
Since the repo hasn't received an update in a while, in the meantime, use the later version of the two pest dependencies.
Context: this has been discussed previously https://github.com/meilisearch/MeiliSearch/issues/1273
meilisearch has been selected by ngi to be packaged for nixos. A patch can be applied to make the changes proposed in this PR. This PR intends to see how the maintainers of meilisearch would feel about the patch.
What was done.
- Add an override for the pest dependency in Cargo.toml.
- recreate the Cargo.lock with `cargo update`. This has had the side effect of updating some dependencies.
I ran the tests on darwin. My machine is quite old so I had 8 failures due to a timeout. None of the failures look like they are due to the new dependencies.
Checking the pest repo, it seems there are some recent commits, however no sure date of when there could be a new release.
If this gets accepted, there is no need to do a new release, nixos can just target the new commit.
If you feel it's too much pain for not enough gain, no worries at all!
Co-authored-by: happysalada <raphael@megzari.com>
1692: Use tikv-jemallocator instead of jemallocator r=curquiza a=felixonmars
`jemallocator` has been abandoned for nearly two years, and `rustc`
itself moved to use `tikv-jemallocator` instead:
3965773ae7
Let's switch to a better maintained version.
Co-authored-by: Felix Yan <felixonmars@archlinux.org>
`jemallocator` has been abandoned for nearly two years, and `rustc`
itself moved to use `tikv-jemallocator` instead:
3965773ae7
Let's switch to a better maintained version.
1651: Use reset_sortable_fields r=Kerollmops a=shekhirin
Resolves https://github.com/meilisearch/MeiliSearch/issues/1635
1676: Add curl binary to final stage image r=curquiza a=ook
Reference: #1673
Changes: * add `curl` binary to final docker Melisearch image.
For metrics, docker funny layer management makes this add a shrink from 319MB to 315MB:
```
☁ MeiliSearch [feature/1673-add-curl-to-docker-image] ⚡ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
getmeili/meilisearch 0.22.0_ook_1673 938e239ad989 2 hours ago 315MB
getmeili/meilisearch latest 258fa3aa1230 6 days ago 319MB
```
1684: bump dependencies r=MarinPostma a=MarinPostma
Bump meilisearch dependencies.
We still depend on custom patch that have been upgraded along the way.
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
Co-authored-by: Thomas Lecavelier <thomas@followanalytics.com>
Co-authored-by: mpostma <postma.marin@protonmail.com>
1682: Change the format of custom ranking rules when importing old dumps r=curquiza a=Kerollmops
This PR changes the format of the custom ranking rules from `asc(price)` to `title:asc` as the format changed between v0.21 and v0.22. The dumps are now correctly importing the custom ranking rules.
This PR also change the previous default ranking rules (without sort) to the new default ranking rules (with the new sort).
Co-authored-by: Kerollmops <clement@meilisearch.com>
1669: Fix windows integration tests r=MarinPostma a=ManyTheFish
Set max_memory value to unlimited during tests:
because tests run several meilisearch in parallel,
we overestimate the value for max_memory making the tests on Windows crash
Co-authored-by: many <maxime@meilisearch.com>
1658: Remove COMMIT_SHA and COMMIT_DATE build arg from the Docker CIs r=irevoire a=curquiza
Since `@irevoire` add the `.git` folder in the Dockerfile, no need to compute `COMMIT_SHA` and `COMMIT_DATE` in the CI.
Can you confirm `@irevoire?`
Also, update some CIs using `checkout@v1` to `checkout@v2`
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1652: Remove dependabot r=MarinPostma a=curquiza
Fixes#1649
Dependabot for vulnerability and security updates is still activated.
1654: Add Script for Windows r=MarinPostma a=singh08prashant
fixes#1570
changes:
1. added script for detecting windows os running git bash
2. appended `.exe` to `$release_file` for windows as listed [here](https://github.com/meilisearch/MeiliSearch/releases/)
3. removed global `$BINARY_NAME='meilisearch'` as windows require `.exe` file
1657: Bring vergen hotfix from `stable` to `main` r=MarinPostma a=curquiza
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: singh08prashant <singh08prashant@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
1656: Remove unused Arc import r=MarinPostma a=Kerollmops
This PR removes a warning introduced by #1606 which removed Sentry that was using an `Arc` but forgot to remove the scope import, we remove it here.
Co-authored-by: Kerollmops <clement@meilisearch.com>
1636: Hotfix: Log but don't panic when vergen can't retrieve commit information r=curquiza a=Kerollmops
This pull request fixes an issue we discovered when we tried to publish meilisearch v0.21 on brew, brew uses the tarball downloaded from github directly which doesn't contain the `.git` folder.
We use the `.git` folder with [vergen](https://docs.rs/vergen) to retrieve the commit and datetime information. Unfortunately, we were unwrapping the vergen result and it was crashing when the git folder was missing.
We no more panic when vergen can't find the `.git` folder and just log out a potential error returned by [the git2 library](https://docs.rs/git2). We then just check that the env variables are available at compile-time and replace it with "unknown" if not.
### When the `.git` folder is available
```
xh localhost:7700/version
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 26 Aug 2021 13:44:23 GMT
Transfer-Encoding: chunked
{
"commitSha": "81a76eab69944de8a8d5006345b5aec7b02acf50",
"commitDate": "2021-08-26T13:41:30+00:00",
"pkgVersion": "0.21.0"
}
```
### When the `.git` folder is unavailable
```bash
cp -R meilisearch meilisearch-cpy
cd meilisearch-cpy
rm -rf .git
cargo clean
cargo run --release
<snip>
Compiling meilisearch-http v0.21.0 (/Users/clementrenault/Documents/meilisearch-cpy/meilisearch-http)
warning: vergen: could not find repository from '/Users/clementrenault/Documents/meilisearch-cpy/meilisearch-http'; class=Repository (6); code=NotFound (-3)
```
```
xh localhost:7700/version
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 26 Aug 2021 13:46:33 GMT
Transfer-Encoding: chunked
{
"commitSha": "unknown",
"commitDate": "unknown",
"pkgVersion": "0.21.0"
}
```
Co-authored-by: Kerollmops <clement@meilisearch.com>
1605: Fix pacic when decoding r=curquiza a=curquiza
Update milli to fix the panic during document deletion
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1533: Update milli version to v0.8.0 r=MarinPostma a=curquiza
- Update milli, heed and obkv
- fix relevancy issue and the `facetsDistribution` display
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
1528: Update of the Date Time Format in commitDate r=MarinPostma a=irevoire
Since we were relying on a [super old version of `vergen`](https://docs.rs/crate/vergen/3.0.1), we could not get the `commit timestamp`, so I updated `vergen` to the latest version.
This also allows us to remove all the features we don't use.
closes#1522
Co-authored-by: Tamo <tamo@meilisearch.com>
1521: Sentry was never sending anything r=Kerollmops a=irevoire
@Kerollmops noticed that we had no log of this release in sentry, and it look like I badly tested my code after ignoring the “No space left on device” errors.
Now it should be fixed.
Co-authored-by: Tamo <tamo@meilisearch.com>
1498: Show the filterable and not the faceted attributes in the settings r=Kerollmops a=Kerollmops
Fixes#1497
Co-authored-by: Clément Renault <clement@meilisearch.com>
1484: Add MeiliSearch version to issue template r=irevoire a=bidoubiwa
It is relevant to know the version of MeiliSearch before any other additional information that might be important to know.
We could also reduce the number of required information asked to the user. I would like to suggest the following:
Instead of the section of `Desktop` and `Smartphone` I would just improve the last section
```
**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]
```
By applying this, the template final look will be the following:
-----
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**MeiliSearch version:** [e.g. v0.20.0]
**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1481: fix bug in index deletion r=Kerollmops a=MarinPostma
this bug was caused by a heed iterator entry being deleted while still holding a reference to it.
close#1333
Co-authored-by: mpostma <postma.marin@protonmail.com>
1478: refactor routes r=irevoire a=MarinPostma
refactor the route directory, so the module tree follows the route structure
Co-authored-by: mpostma <postma.marin@protonmail.com>
1457: Hotfix highlight on emojis panic r=Kerollmops a=ManyTheFish
When the highlight bound is in the middle of a character
or if we are out of bounds, we highlight the complete matching word.
note: we should enhance the tokenizer and the Highlighter to match char indices.
Fix#1368
Co-authored-by: many <maxime@meilisearch.com>
1456: Fix update loop timeout r=Kerollmops a=Kerollmops
This PR fixes a wrong fix of the update loop introduced in #1429.
Co-authored-by: Kerollmops <clement@meilisearch.com>
When the highlight bound is in the middle of a character
or if we are out of bounds, we highlight the complete matching word.
note: we should enhance the tokenizer and the Highlighter to match char indices.
Fix#1368
259: Run rustfmt one the whole project and add it to the CI r=curquiza a=irevoire
Since there is currently no other PR modifying the code, I think it's a good time to reformat everything and add rustfmt to the ci.
Co-authored-by: Tamo <tamo@meilisearch.com>
258: Use rustls instead of openssl r=curquiza a=irevoire
I also removed all the `default-features` of reqwest since we are only using the JSON one.
Fix#255
Co-authored-by: Tamo <tamo@meilisearch.com>
246: Stop logging the no space left on device error r=curquiza a=irevoire
closes#208
@qdequele what do you think of that?
Are there any other errors we need to ignore?
As you can see in the code, once we are in `Sentry` the error has already been converted to a `String` so the only thing we can do to see if we need to send the error or not is to match the `String` against our error message.
If we have a lot of other logs we want to ignore I would suggest prefixing all the logs with something like:
```
User error: No space left on device
```
So in Sentry, we could just check if the log start by `User error:` and ignore all these errors at once
Co-authored-by: Tamo <tamo@meilisearch.com>
252: Fix docker run r=curquiza a=curquiza
Not the most beautiful fix since I cannot update alpine to version 3.14 without being flooded with errors.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
249: Use half of the computer threads for the indexing process by default r=Kerollmops a=irevoire
closes#241
By default, we use only half of the CPU threads when indexing documents; this allows the user to use the search while indexing. Also, the machine will not appear unresponsive when indexing a large batch of documents.
On the special case where a user only has one core, we use it entirely 😄
Co-authored-by: Tamo <tamo@meilisearch.com>
248: Unused borrow that must be used r=curquiza a=irevoire
I noticed #228 introduced a warning while compiling
Co-authored-by: Tamo <tamo@meilisearch.com>
228: Authentication rework r=curquiza a=MarinPostma
In an attempt to fix#201, I ended up rewriting completely the authentication system we use. This is because actix doesn't allow to wrap a single route into a middleware, so we initially put each route into it's own service to use the authentication middleware. Routes are now grouped in resources, fixing #201.
As for the authentication, I decided to take a very different approach, and ditch middleware altogether. Instead, I decided to use actix's [extractor](https://actix.rs/docs/extractors/). `Data` is now wrapped in a `GuardedData<P: Policy, T>` (where `T` is `Data`) in each route. The `Policy` trait, thanks to the `authenticate` method tell if a request is authorized to access the resources in the route. Concretely, before the server starts, it is configured with a `AuthConfig` instance that can either be `AuthConfig::NoAuth` when no auth is required at runtime, or `AuthConfig::Auth(Policies)`, where `Policies` maps the `Policy` type to it singleton instance.
In the current implementation, and this to match the legacy meilisearch behaviour, each policy implementation contains a `HashSet` of token (`Vec<u8>` for now), that represents the user it can authenticate. When starting the program, each key (identified as a user) is given a set of `Policy`, representing its roles. The later is facilitated by the `create_users` macro, like so:
```rust
create_users!(
policies,
master_key.as_bytes() => { Admin, Private, Public },
private_key.as_bytes() => { Private, Public },
public_key.as_bytes() => { Public }
);
```
This is some groundwork for later development on a full fledged authentication system for meilisearch.
fix#201
Co-authored-by: marin postma <postma.marin@protonmail.com>
240: Rework error messages r=irevoire a=MarinPostma
Simplify the error messages, and make them more compliant with legacy Meilisearch.
Basically, stop composing the messages, and simply forward the message of inner errors.
Co-authored-by: marin postma <postma.marin@protonmail.com>
230: Logs r=MarinPostma a=irevoire
closes#193
Since we can't really print the body of requests in actix-web, I logged the parameters of every request and what we were returning to the client.
Co-authored-by: Tamo <tamo@meilisearch.com>
1415: Fix README.md typos r=curquiza a=dichotommy
Just fixing some typos and such.
Kanji -> Hanzi
Kanji refers only to the Japanese versions of Chinese characters, and since we don't have a Japanese tokenization pipeline I think it could be misunderstood.
Co-authored-by: Tommy <68053732+dichotommy@users.noreply.github.com>
Just fixing some typos and such.
Kanji refers only to Japanese versions of the Chinese characters, and since we don't have a Japanese tokenization pipeline I think it could be misleading.
232: Fix payload size limit r=MarinPostma a=MarinPostma
Fix#223
This was due to the fact that Payload ignores the limit payload size limit. I fixed it by implementing my own `Payload` extractor that checks that the size of the payload is not too large.
I also refactored the `create_app` a bit.
Co-authored-by: marin postma <postma.marin@protonmail.com>
227: improve mini dashboard routing r=MarinPostma a=MarinPostma
The dependency we use to statically serve the mini-dashboard used globing to serve the mini-dashboard files. This caused all unfound routes to be caught by the "/" serving the dashboard assets. This fix makes it so that the assets have a dedicated route, and any unfound route is caught by the default service and return a 404.
Co-authored-by: marin postma <postma.marin@protonmail.com>
229: Add exhaustiveFacetsCount r=MarinPostma a=curquiza
I completely forgot this one 😅
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
226: Make facetsDistribution name iso r=MarinPostma a=curquiza
Even if there is an English mistake in `facets_distribution` (because of the `s`) @gmourier asked me to keep the typo: the name of `facetsDistribution` might change completely in the future, he wants to avoid two breakings.
@gmourier can you confirm before we merge this PR?
Sorry I left this update in the code (I'm confused because no issues was open to update `facetsDistribution`), there might have been a confusion with `fieldsDistribution` that has been renamed into `fieldDistribution`. Sorry!
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
220: Implement `matches` r=irevoire a=MarinPostma
implement `_matchesInfo`. I initially thought we could factor it inside the highlighting, but they are unrelated features after all, and needed a dedicated pass too handle.
Co-authored-by: marin postma <postma.marin@protonmail.com>
213: Implement all the CLI options r=MarinPostma a=irevoire
closes#206
And I looked into #204, I fixed some default values and tried to test as many options as possible, and I think the cli is already mostly working.
If someone knows any issues about it, I would like to hear more 🙂
Co-authored-by: Tamo <tamo@meilisearch.com>
211: fix index deletion race condition r=MarinPostma a=MarinPostma
Make update store block if the currently processing update is from an index we are trying to delete. This ensure that no write to the index can occur after it has been deleted.
218: Update milli version to v0.5.0 r=MarinPostma a=curquiza
Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
221: fix get search crop len r=irevoire a=MarinPostma
Fix bug where crop length was mandatory when performing a GET search.
Co-authored-by: marin postma <postma.marin@protonmail.com>
210: Error handling r=MarinPostma a=MarinPostma
This pr implements the error handling for meilisearch.
Rather than grouping errors by types, this implementation groups them by scope, each scope enclosing errors from a scope further down, or new errors within this scope. This makes the tracking of the origins of errors easier , and error handling easier at the module level.
All errors that are eventually returned to the user implement the `Into<ResponseError>` trait. `ReponseError` in turn implements the `ErrorCode` trait from `meilisearch-error`.
Some new errors have been introduced with the new engine for which we haven't defined error codes yet. It has been decided with @gmourier that those would return the `internal-error` code until the correct error code is specified.
Co-authored-by: marin postma <postma.marin@protonmail.com>
209: Integrate amplitude r=MarinPostma a=irevoire
And merge the sentry and amplitude usage under one “Enable analytics” flag
closes#180
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
197: Update milli (v0.3.1) with filterable attributes r=MarinPostma a=curquiza
Fixes#187 and #70
Also fixes#195
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
144: Concurrent update run loop (refactor) r=MarinPostma a=MarinPostma
This PR allows multiple request to the update store to be performed concurently (i.e, one can list updates while an updates in being written to the update store).
173: Convert UpdateStatus to legacy meilisearch format r=MarinPostma a=MarinPostma
Returns the update statuses with the same format as legacy meilisearch.
The number of documents in a document addition/deletion is not known before processing, so it is only returned when the update is `processed`.
close#78
associated milli PR: https://github.com/meilisearch/milli/pull/178
Co-authored-by: marin postma <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
179: Enable filter paramater during search r=MarinPostma a=MarinPostma
This pr makes the necessary changes to transplant in accordance with the specification on filters.
More precisely, it:
- Removes the `filters` parameter
- Renames `facetFilters` to `filter`
- Allows either a string or an array to be passed to the filter param.
It doesn't allow the mixed syntax, that needs to be handled by milli.
close#81close#140
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
202: Add a github action to run cargo-flaky 1000 times r=curquiza a=irevoire
I don’t know how to ensure the CI works so it’s just a first version, do not hesitate to update the code
Co-authored-by: Irevoire <tamo@meilisearch.com>
1403: fix amount of time r=curquiza a=TheTechRobo
The new MeiliSearch sandboix website says "48 hours" rather than 72, so I updated the readme to reflect that
Co-authored-by: TheTechRobo <52163910+TheTechRobo@users.noreply.github.com>
1399: Update download-latest.sh r=curquiza a=94noni
Hey, PR of the weekend :)
Kidding, I began to use MeiliSearch recently for fun&personal usage, wishing you good luck for your next v0.21|v1.0 releases
Cheers
Co-authored-by: Antoine Makdessi <amakdessi@me.com>
158: Implements the dumps r=irevoire a=irevoire
closes#20
divergence from legacy meilisearch:
- dump v2 added, support loading of pending updates (only works dumps created from v2)
- added time stamps to the dump info
- Dump info are only persisted in an internal data structure, and they are not fetched from fs on demand anymore. This was a potential security flaw. This means that the dump infos are flushed on every restart.
Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
186: settings fix r=MarinPostma a=MarinPostma
add type checked settigns validation. For now it only transform the settings accepting wildcard
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
172: Fix cors authentication issue r=MarinPostma a=MarinPostma
The error was due to the middleware returning an error, instead of a response containing the error.
close#110
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
170: Improve CI r=MarinPostma a=curquiza
Checked with @Kerollmops to improve (a little bit) the CI execution time.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
171: Update mini-dashboard with version 0.1.2 r=MarinPostma a=mdubus
Update of the mini-dashboard sha1 & assets-url, due to a new release
Co-authored-by: Morgane Dubus <morgane.d@meilisearch.com>
143: Shared update store r=irevoire a=MarinPostma
This PR changes the updates process so that only one instance of an update store is shared among indexes.
This allows updates to always be processed sequentially without additional synchronization, and fixes the bug where all the first pending update for each index were reported as processing whereas only one was.
EDIT:
I ended having to rewrite the whole `UpdateStore` to allow updates being really queued and processed sequentially in the ordered they were added. For that purpose I created a `pending_queue` that orders the updates by a global update id.
To find the next `update_id` to use, both globally and for each index, I have created another database that contains the next id to use.
Finally, all updates that have been processed (with success or otherwise) are all stores in an `updates` database.
The layout for the keys of these databases are such that it is easy to iterate over the elements for a particular index, and greatly reduces the amount of code to do so, compared to the former implementation.
I have also simplified the locking mechanism for the update store, thanks to the StateLock data structure, that allow both an arbitrary number of readers and a single writer to concurrently access the state. The current state can be either Idle, Processing, or Snapshotting. When an update or snapshotting is ongoing, the process holds the state lock until it is done processing its task. When it is done, it sets bask the state to Idle.
I have made other small improvements here and there, and have let some other for work, such as:
- When creating an update file to hold a request's content, it would be preferable to first create a temporary file, and then atomically persist it when we have written to it. This would simplify the case when there is no data to be written to the file, since we wouldn't have to take care about cleaning after ourselves.
- The logic for content validation must be factored.
- Some more tests related to error handling in the process_pending_update function.
- The issue #159close#114
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
161: put mini-dashboard in out-dir r=MarinPostma a=MarinPostma
This PR puts the mini-dashboard during build in the `OUT_DIR` specified by cargo. This allow the mini-dashboard artifacts to be cleaned when `cargo clean` is ran, and not pollute the working directory with unwanted files.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
153: integrate mini dashboard r=MarinPostma a=MarinPostma
This PR integrate the [mini dashboard](https://github.com/meilisearch/mini-dashboard) to transplant.
It adds a build feature `mini-dashboard` to statically add the mini-dashboard to the MeiliSearch binary. The mini-dashboard build feature is enabled by default and can be disabled by building MeiliSearch with `cargo build --no-default-features`.
- [x] Fetch the mini-dashboard from the Github release
- [x] Check that the SHA1 on the downloaded payload matches the one in the metadata
- [x] Unpack the mini dashboard in `meilisearch-http/mini-dashboard`
- [x] serve the mini-dashboard if the `mini-dashboard` feature is enabled
- [x] Update CI to build MeiliSearch with mini-dashboard for releases
close#87
## Shasum check and build optimizations.
In order to make sure that the right bundle for the mini-dashboard is downloaded, its shasum is computed and compared to the one specified in the `Cargo.toml`. If the shasums match, them the shasum is written to the `.mini-dashboard.sha1` file for later comparison. On subsequent builds, the build script will check that both the mini-dashboard assets and the shasum file are found and that the shasum file content matches the one from the toml file. It will only preform a re-generation on the static dashboard files if it finds that either the dashboard is not present where it expects it to be, or if it finds out that it is outdated, by comparing the shasums.
## Notes
I had to rely on a [custom patch](https://github.com/MarinPostma/actix-web-static-files/tree/actix-web-4) of actix-web-static-files, to support actix-web 4 beta6. there is currently a [pr on the official repo](https://github.com/kilork/actix-web-static-files/pull/35) to support actix-web 4, but it most likely won't be merged until actix is stabilized.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
146: Remove another unused legacy file r=MarinPostma a=irevoire
When doing #135 I missed an old useless file in the scr/routes directory
Co-authored-by: tamo <tamo@meilisearch.com>
113: snapshots r=MarinPostma a=MarinPostma
This pr adds support for snapshoting.
The snapshoting process for an index requires that no other update is processing at the same time. A mutex lock has been added to prevent a snapshot from occuring at the same time as an update, while still premitting updates to be pushed.
The list of the indexes to snapshot is first retrieved from the `UuidResolver` which also performs its snapshot.
This list is passed to the update store, which attempts to acquire a lock on the update store while it snaphots itself and it's associated index store.
This means that a snapshot can only be completed once all indexes have finished their ongoing update.
This pr also adds refactoring of the code to allow unit testing and mocking, and unit test the snapshot creation.
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
1315: fix armv7 r=MarinPostma a=MarinPostma
fix armv7 build
this was caused by usize being 32 bit on armv7 and 64bits on all other targeted architectures.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
1307: change ubuntu version r=MarinPostma a=MarinPostma
Change the CI ubuntu version from `latest` to `18.04` because `latest` uses a too recent version of glibc, preventing meilisearch from running on the debian version of the DO image
Co-authored-by: mpostma <postma.marin@protonmail.com>
109: Make updates atomic r=curquiza a=MarinPostma
Until now, the index_uid->uuid mapping was done before the update was written to disk in the case of automatic index creation. This was an issue when the update failed, and the index would still exists in the uuid resolver.
This is fixed by this pr, by first creating the update with an uuid if the index does not exist, and then register this uuid to the uuid resolver.
This is preliminary work to the implementation of snapshots (#19).
This pr also changes the `resolve` method on the `UuidResolver` to `get` to make it clearer.
The `create_uuid` method may be bound to disappear when the index name resolution is handled by a remote machine.
Co-authored-by: mpostma <postma.marin@protonmail.com>
115: Add the exhaustiveNbHits in search response body (returns always false) r=curquiza a=irevoire
closes#103
Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
108: use write senders for updates r=MarinPostma a=MarinPostma
Use write senders to send updates to the `IndexActor`, so updates are performed sequentially on all indexes.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1291: Use 200 status code for healthcheck endpoint r=MarinPostma a=irevoire
closes #1282
Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <tamo@meilisearch.com>
1292: return a 200 on / when meilisearch is running in production r=MarinPostma a=irevoire
close#1235
Co-authored-by: tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
96: Check json payload on document addition r=curquiza a=MarinPostma
Check if the json payload in updates is valid. It uses a json validator to avoid allocation, and only serializes the json in case of error, to return a pretty message.
Co-authored-by: mpostma <postma.marin@protonmail.com>
88: restore name field in index meta r=MarinPostma a=MarinPostma
Makes the IndexMetadata payload iso with legacy meilisearch and closes#67
Co-authored-by: mpostma <postma.marin@protonmail.com>
1238: fix snapshot temp file r=curquiza a=MarinPostma
fix snapshot creating a temp file in /tmp, and create the temp file in the snapshot directory instead.
close#1237
Co-authored-by: mpostma <postma.marin@protonmail.com>
1286: Timestamp changelog r=curquiza a=sandstrom
A timestamped changelog makes it easier to track progress, understand velocity, see if something has recently changed, etc.
https://keepachangelog.com/en/1.0.0/
Co-authored-by: sandstrom <mail+github@a16m.se>
1280: Make sure that we do not use jemalloc on macos r=MarinPostma a=Kerollmops
We were wrongly compiling jemalloc on macOS even though we did use it only on Linux.
Fixes#1136.
Co-authored-by: Clément Renault <clement@meilisearch.com>
1206: fix running URL display r=curquiza a=fharper
by doing that you can just click on it in the terminal if you want
Co-authored-by: Frédéric Harper <hi@fred.dev>
1265: Inferring whether to show or Hide API Key box r=curquiza a=sanders41
Relates to #1261
This is one potential solution for inferring whether an instance has an API key and show or hide the text input box accordingly. When the page first loads a request is sent to the server with no API key. If that request was successful then no API key is need so the box is hidden. If the request returns with a 401 status then the API Key was needed and it is shown.
Co-authored-by: Paul Sanders <psanders1@gmail.com>
1266: Simplify compile and run from sources r=curquiza a=tpayet
Related to #1136, I just saw that compile & run instructions from sources were not up to date
Co-authored-by: Thomas Payet <thomas@meilisearch.com>
1260: README.md: typos r=Kerollmops a=skerkour
Hey, I think I've noticed small typos. Feel free to close if I'm wrong :)
Co-authored-by: Sylvain Kerkour <6172808+skerkour@users.noreply.github.com>
1220: Update Contact section of README.md r=Kerollmops a=react-learner
- Remove reference to Crisp chatbox (currently deactivated on docs site and homepage)
- Remove bonjour @ meilisearch.com email address, in order to concentrate communications in visible locations such as Slack and forums. @fharper
Co-authored-by: Tommy <68053732+react-learner@users.noreply.github.com>
1224: fix synonyms normalization r=MarinPostma a=LegendreM
Synonyms needs to be indexed in ascendant order,
and the new normalization step for synonyms potentially changes this order
which break the indexation process
because "Harry Potter" > "HP" but "harry potter" < "hp"
Co-authored-by: many <maxime@meilisearch.com>
1222: Ignore existing primary key r=Kerollmops a=MarinPostma
fixing bug in #1176 made it an hard error to try to re-set the primary key on a document addition. This PR makes Meilisearch ignore a primary key passed as an argument to a document addition. This has been decided after a discussion with @curquiza, in order to make the bug fix non breaking.
Turns out it was a good catch too, since contrary to what I thought the error was not caught asynchronously, thank you @curquiza
Co-authored-by: mpostma <postma.marin@protonmail.com>
Synonyms needs to be indexed in ascendant order,
and the new normalization step for synonyms potentially changes this order
which break the indexation process
because "Harry Potter" > "HP" but "harry potter" < "hp"
1172: Fix atomic snapshot creation r=MarinPostma a=raszi
Compress gzip files to a temporary file first and then do an atomic rename.
In our setup we have an indexer which does snapshoting for the instances serving the requests. Since currently the snapshoting mechanism is replacing the file in place therefore the indexer could not share the snapshot with a live instance.
With this small patch we first create a new temporary file in the same directory as the snapshot dir and then we do an atomic rename therefore the snapshot path would always contain a valid snapshot.
After applying this change it would be enough to simply restart the serving instances to pick up the new snapshot from a shared storage without worrying them to die because of an incomplete snapshot.
Co-authored-by: KARASZI István <ikaraszi@gmail.com>
1176: fix race condition in document addition r=Kerollmops a=MarinPostma
As described in #1160, there was a race condition when updating settings and adding documents simultaneously. This was due to the schema being updated and document addition being processed in two different transactions. This PR moves the schema update logic for the primary key in the same transaction as the document addition, while maintaining the input checks for the validity of the primary key in the http route, in order not to break the error reporting for the document addition route.
close#1160.
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
1184: normalize synonyms during indexation r=MarinPostma a=LegendreM
fix#1135#964
Normalizes the synonyms before indexing them, so they are not case sensitive anymore. Then normalization also involves deunicoding is some cases, such as accents, so `été` and `ete` are considered equivalent in a search for synonyms.
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
1174: Limit query words number r=MarinPostma a=MarinPostma
This pr adds a limit to the number of words taken into account in a search query. Using query string that are too long leads to huge performance hits and ressources consumtion, that occasionally crashes the machine. The limit has been hard set to 10, and tests have been added to make sure that it is taken into account.
close#941
Co-authored-by: mpostma <postma.marin@protonmail.com>
1207: fix homebrew name r=MarinPostma a=fharper
brew is the command, the package manager name is homebrew
Co-authored-by: Frédéric Harper <hi@fred.dev>
1185: fix cors issue r=MarinPostma a=MarinPostma
This PR fixes a bug where foreign origin were not accepted.
This was due to an update to actix-cors
It also fixes the cors bug when authentication failed, with the caveat that request that are denied for permissions reason are not logged.
it introduces a bug described in #1186
Co-authored-by: mpostma <postma.marin@protonmail.com>
1167: Update dumps ci r=LegendreM a=MarinPostma
Now that the dump test are re-entrant, they can be run from a multithreaded context, whereas they used to be ran from a single threaded context, in a separate CI task.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1091: New tokenizer r=LegendreM a=MarinPostma
Integration of the new tokenizer to meilisearch.
- Tokenize and normalizes the query string for better search results
- Language sensitive tokenization and normalization during indexation
- better support for Chinese thanks to jieba (when Chinese characters are detected)
To do in a later PR:
- Use a common tokenization instance
- use tokenization for synonyms
close#624
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: many <maxime@meilisearch.com>
1163: remove benches r=LegendreM a=MarinPostma
remove unused benches, that did not compile either
Co-authored-by: mpostma <postma.marin@protonmail.com>
1100: [fix] Remove some clippy warnings r=MarinPostma a=woshilapin
fix#1099
I'm also wondering if I should add `-- --deny warnings` to the modified line in `test.yml`.
Co-authored-by: Jean SIMARD <woshilapin@tuziwo.info>
849: Update nbHits count with filtered documents r=MarinPostma a=balajisivaraman
Closes#764close#1039
After discussing with @MarinPostma on Slack, this is my first attempt at implementing this for the basic flow that will go through `bucket_sort_with_distinct`.
A few thoughts here:
- For getting the count of filtered documents alone, I originally thought of using `filter_map.values().filter(|&&v| !v).count()`. In a few cases, this was the same as what I have now implemented. But I realised I couldn't do something similar for `distinct`. So for being consistent, I have implemented both in a similar fashion.
- I also needed the `contains_key` check to ensure we're not counting the same document ID twice.
@MarinPostma also mentioned that this will be an approximation since the sort is lazy. In the test example that I've updated, the actual filtered count will be just 19 (for `male` records), but due to the `limit` in play, it returns 32 (filtering out 11 records overall).
Please let me know if this is the kind of fix we are looking for, and I can implement it in the placeholder search also.
Co-authored-by: Balaji Sivaraman <balaji@balajisivaraman.com>
1089: Fix clear bug r=Kerollmops a=MarinPostma
close#1088
The placeholder data was not cleared on when deleting all documents.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1087: Add deploy on Platform.sh option to README r=Kerollmops a=chadwcarlson
We have had a lot of success using Meilisearch on our public documentation, and I've put together the "movies" demo to quickly show it off. Included in our template README is instructions for modifying the template deployment to make it production ready.
All the best.
As per CONTRIBUTING, related to https://github.com/meilisearch/MeiliSearch/issues/1086
Co-authored-by: chadcarlson <chad.carlson@platform.sh>
1077: Change movie gifs r=MarinPostma a=bidoubiwa
Remove old movie gif that showed some misleading information
- Typo on first letter
- `word` ranking rules implemented
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1059: Bump serde from 1.0.116 to 1.0.117 r=MarinPostma a=dependabot[bot]
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.116 to 1.0.117.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/serde-rs/serde/releases">serde's releases</a>.</em></p>
<blockquote>
<h2>v1.0.117</h2>
<ul>
<li>Allow serialization of std::net::SocketAddrV6 to include a scope id if present (based on <a href="https://github-redirect.dependabot.com/rust-lang/rust/pull/77426">rust-lang/rust#77426</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="fc3f104c4a"><code>fc3f104</code></a> Release 1.0.117</li>
<li><a href="4bec9ffd0f"><code>4bec9ff</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/serde-rs/serde/issues/1906">#1906</a> from Mingun/fix-misprint</li>
<li><a href="e6d2322e68"><code>e6d2322</code></a> Fix misprint in the error message</li>
<li><a href="2b504099e4"><code>2b50409</code></a> Include room for SocketAddrV6 to serialize scope id</li>
<li><a href="be7d0e7eb2"><code>be7d0e7</code></a> Ignore map_err_ignore Clippy pedantic lint</li>
<li>See full diff in <a href="https://github.com/serde-rs/serde/compare/v1.0.116...v1.0.117">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/configuring-github-dependabot-security-updates)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
</details>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1052: Revert "Merge #1001" r=Kerollmops a=MarinPostma
This reverts commit 690eab4a25, reversing
changes made to 086020e543.
After arbitrage with @curquiza and @eskombro, this fix would introduce a relevancy bug that cannot be circumvented, whereas the previous bug was only a setting bug with a workaround. we need to discuss this issue further to provide a fix that meets our expectations.
related to #1050
This will be merged directly in the release branche, as a hotfix
Co-authored-by: mpostma <postma.marin@protonmail.com>
1045: Revert "Merge #1037" r=MarinPostma a=MarinPostma
This reverts commit 257f9fb2b2, reversing
changes made to 9bae7a35bf.
The reason fo this is that de-unicoding is not always desirable (for example is the case of CJK documents). This cannot be handled correctly for now, and will necessitate work on the tokenizer.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1037: Synonym unidecode r=Kerollmops a=MarinPostma
fix#964
- unidecodes all synonyms before adding them to the synonyms fst
- stores a copy of the original synonyms (unicoded) for later retrieve
Co-authored-by: mpostma <postma.marin@protonmail.com>
1032: Remove not maintained csv movies dataset r=MarinPostma a=bidoubiwa
Remove `movies.csv` from the dataset folder as it is not updated and not usable with MeiliSearch without converting it to json.
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1040: Update movie posters r=Kerollmops a=bidoubiwa
This PR resolves 3 issues:
1. update posters URLs that changed
2. All posters point to a smaller image ( +- 20kb instead of 500kb+-) this was done by changing the width size from 1280 px to 500 px.
3. Remove films that are not in the tmdb database
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1038: Add Sandbox section to README.md r=LegendreM a=eskombro
This PR adds a link to [MeiliSearch Sandbox](https://sandbox.meilisearch.com/) in the README.md
Co-authored-by: Samuel Jimenez <sjimenezre@gmail.com>
1028: Clean external contributions r=Kerollmops a=LegendreM
We accepted some unperfect external PRs, this one is here to clean this:
- clean PR #946 (remove changelog line and add forgotten newline)
- remove useless function after health route refacto #1026
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
1034: Remove outdated settings file r=Kerollmops a=bidoubiwa
Unnecessary settings files in the dataset folder should be removed.
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1. NEEDS to ensure that service is completely up if it returns 204
2. DOES NOT block service process (write transaction)
3. NEEDS to use the less network bandwidth as possible when it's triggered
4. NEEDS to use the less service resources as possible when it's triggered
5. DOES NOT NEED any authentication
6. MAY be named /health
997: fix(core): fix benchmark in core with types r=LegendreM a=neeldug
forces a dereference onto query and then creates an option to wrap the
query
Closes#994
Co-authored-by: nd419 <5161147+neeldug@users.noreply.github.com>
984: Add test search r=LegendreM a=LegendreM
- Get an error if the index does not exist
- Get an error if a parameter is not expected (e.g.: "lol")
- Check a basic search with no parameter
- Check a basic search with only a q parameter
isssue #814
Co-authored-by: many <maxime@meilisearch.com>
946: Sort displayedAttributes field r=MarinPostma a=gorogoroumaru
Fix#943
displayedAttributes use the HashSet struct which is an unsorted structure, so I changed the implementation from HashSet into BTreeSet.
Co-authored-by: gorogoroumaru <zokutyou2@gmail.com>
1007: fix clippy errors r=MarinPostma a=qdequele
I fixed clippy warning and errors. It will allow us to not have future issues when bors try to merge a branch.
Co-authored-by: qdequele <quentin@dequelen.me>
976: Revert 944 r=MarinPostma a=MarinPostma
revert #944
@bidoubiwa @curquiza @eskombro, this was a misunderstanding from our side. Doing this would in fact be an error, and would prevent us to do this: https://github.com/meilisearch/MeiliSearch/issues/945#issuecomment-685526678, which is what we are really after. We are resetting this to its default behaviour before it goes in prodution. Sorry for the confusion.
Co-authored-by: mpostma <postma.marin@protonmail.com>
* trigger backup importation via http route
* follow backup advancement with status route
* import backup via a command line
* let user choose batch size of documents to import (command lines)
closes#884closes#840
963: upgrade actix-web to v3 r=Kerollmops a=robjtede
Test failures are the same before and after upgrade.
Co-authored-by: Rob Ede <robjtede@icloud.com>
960: bump version and update changelog r=MarinPostma a=LegendreM
* bump to 0.14.1
* update CHANGELOG.md file
Co-authored-by: many <maxime@meilisearch.com>
959: add version guard in copy_and_compact_to_path function r=MarinPostma a=LegendreM
fix#958
need to create 0.14.1
Co-authored-by: many <maxime@meilisearch.com>
926: Update genre field with genres r=MarinPostma a=bidoubiwa
Most code samples are made with the assumption that the `genres` field takes an `s`. I'm updating the dataset to match those code-samples.
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
910: Fix typo in error message r=MarinPostma a=curquiza
Thanks to @ppamorim for reporting the typos to me!
Co-authored-by: Clementine Urquizar <clementine@meilisearch.com>
829: implement snapshoting r=MarinPostma a=LegendreM
related to #551.
This pull request permit user to create periodically a snapshot of MeiliSearch database via a command line and launch meiliSearch from a snapshot with another command
## Documentation
### schedule a snapshot
`--snapshot-path <DIRECTORY_PATH>`:
this will periodically create a snapshot `<DB_NAME>.tar.gz` in the specified directory
### change period between 2 snapshot creation
`--snapshot-interval-sec <GAP_IN_SEC>`
choose the time gap between 2 snapshot
### start meilisearch from a snapshot
`--load-from-snapshot <FILE_PATH>`
this will use the snapshot stored at `<FILE_PATH>` to initialize MeiliSearch database,
`--ignore-snapshot-if-db-exists` if set and if a db already exists,
this will skip snapshot importation and continue process with actual db instead of quitting process by returning an Error
`--ignore-missing-snapshot` if set and if no snapshot exists at provided path,
this will skip snapshot importation and continue process with actual db instead of quitting process by returning an Error
Co-authored-by: many <maxime@meilisearch.com>
889: Fix clippy warnings r=MarinPostma a=TaKO8Ki
Good day!
Since `cargo clippy` showed two warnings like the following, I've fixed them. This is a small PR.
```sh
warning: use of `ok_or` followed by a function call
--> meilisearch-core/src/database.rs:185:18
|
185 | .ok_or(Error::VersionMismatch("bad VERSION file".to_string()))?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try this: `ok_or_else(|| Error::VersionMismatch("bad VERSION file".to_string()))`
|
= note: `#[warn(clippy::or_fun_call)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#or_fun_call
warning: useless use of `format!`
--> meilisearch-core/src/database.rs:208:59
|
208 | return Err(Error::VersionMismatch(format!("<0.12.0")));
| ^^^^^^^^^^^^^^^^^^ help: consider using `.to_string()`: `"<0.12.0".to_string()`
|
= note: `#[warn(clippy::useless_format)]` on by default
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#useless_format
warning: 2 warnings emitted
```
Co-authored-by: Takayuki Maeda <41065217+TaKO8Ki@users.noreply.github.com>
888: Remove schema mention in error message r=MarinPostma a=curquiza
We avoid mentioning the schema since MeiliSearch is schemaless for the user 🙂
Co-authored-by: Clementine Urquizar <clementine@meilisearch.com>
638: Update requitites for source build(rust version) r=MarinPostma a=djKooks
Hello,
I just found that compile via source has been failed by issue here:
```
error[E0658]: the `#[non_exhaustive]` attribute is an experimental feature
--> /Users/kwangin.jung/.cargo/registry/src/github.com-1ecc6299db9ec823/whoami-0.8.1/src/lib.rs:40:1
|
40 | #[non_exhaustive]
| ^^^^^^^^^^^^^^^^^
|
= note: for more information, see https://github.com/rust-lang/rust/issues/44109
error[E0658]: the `#[non_exhaustive]` attribute is an experimental feature
--> /Users/kwangin.jung/.cargo/registry/src/github.com-1ecc6299db9ec823/whoami-0.8.1/src/lib.rs:102:1
|
102 | #[non_exhaustive]
| ^^^^^^^^^^^^^^^^^
|
= note: for more information, see https://github.com/rust-lang/rust/issues/44109
```
Seems `#[non_exhaustive]` is a new feature on Rust 1.40.0, so added as pre-requitites.
828: Cleanup readme r=MarinPostma a=tpayet
Closes#613
865: Update movie dataset with genre field r=MarinPostma a=bidoubiwa
Updated the movie dataset by adding the `genre` field to each movies where the genre could be fetched.
The `genre` was fetch for each movie by making a search request on the bigger movie dataset (200mb) using MeilISearch.
I make this proposition to make testing and trying more accessible.
```json
{
"id": "323661",
"title": "Mune: Guardian of the Moon",
"poster": "https://image.tmdb.org/t/p/w1280/4vzqow7mVUahqA4hHoe2UpQOxy.jpg",
"overview": "When a faun named Mune becomes the Guardian of the Moon, little did he had unprepared experience with the Moon and an accident that could put both the Moon and the Sun in danger, including a corrupt titan named Necross who wants the Sun for himself and placing the balance of night and day in great peril. Now with the help of a wax-child named Glim and the warrior, Sohone who also became the Sun Guardian, they go out on an exciting journey to get the Sun back and restore the Moon to their rightful place in the sky.",
"release_date": 1423094400,
"genre": [
"Animation",
"Family",
"Adventure",
"Fantasy",
"Comedy"
]
}
{
"id": "306",
"title": "Beverly Hills Cop III",
"poster": "https://image.tmdb.org/t/p/w1280/tw9gAhqQcBFX0X0XfVbWqUsmzoU.jpg",
"overview": "Back in sunny southern California and on the trail of two murderers, Axel Foley again teams up with LA cop Billy Rosewood. Soon, they discover that an amusement park is being used as a front for a massive counterfeiting ring – and it's run by the same gang that shot Billy's boss.",
"release_date": 769741200,
"genre": [
"Action",
"Comedy",
"Crime"
]
}
```
Co-authored-by: kwangin.jung <inylove82@gmail.com>
Co-authored-by: Thomas Payet <thomas@meilisearch.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
873: Update CI for new workflow r=MarinPostma a=MarinPostma
This pr implements the necessary automation for our new release workflow.
## Pre-releases
whenever something is pushed to a branch `release-v*`, tests are triggered. If all test pass, the current reference is checked to see if it's a release branch. If it's a release branch, a pre-release is created for this branch and assets are automatically generated for this branch. The prerelease has the tag `vx.x.xrcn` where `x.x.x` is the version extracteds from the branch name, and n is the number of commits since the branch was forked from master. (starting from rc0).
## Releases
Whenever something is pushed to stable and tagged `vx.x.x` where `x.x.x` is the version, tests are run and a release is generated containing the assets, and binaries are published to docker, brew, apt, etc.
Co-authored-by: mpostma <postma.marin@protonmail.com>
846: Change settings behavior r=LegendreM a=MarinPostma
partially implements #824.
Returning the field distribution for all know fields is more complicated that anticipated, see https://github.com/meilisearch/MeiliSearch/issues/824#issuecomment-657656561
If we decide to to it anyway, and find a reasonable solution, I will make another PR.
fix#853 by resetting displayed and searchable attributes to wildcard when attributes are set to `[]` in the all settings route. @curquiza @bidoubiwa can you confirm me that this is the expected behavior?
Co-authored-by: mpostma <postma.marin@protonmail.com>
794: Check database version mismatch r=MarinPostma a=MarinPostma
Checks if the versions of the database and the engine are compatible.
The database and the engine are compatible if they share the same major and minor version.
The engine will refuse to start if there is a mismatch.
@bidoubiwa do we need to document this?
Co-authored-by: mpostma <postma.marin@protonmail.com>
791: Create tests for error codes r=LegendreM a=MarinPostma
- create tests for error codes
- fix primary key error that returned internal error instead of the correct error
- bits of documentation for error
- change a bunch of error type, for better accuracy, @curquiza, @eskombro, @bidoubiwa you may want to take a look at `meilisearch-error/src/lib.rs`
- fix#836
Co-authored-by: mpostma <postma.marin@protonmail.com>
842: bors setup r=LegendreM a=MarinPostma
set up bors to run the tests and merge automatically.
the tests are now run only on staging and trying branches
you can use `bors r+` to test and merge the branch into master if the tests succeed
or
you can just use `bors try` to run the test on the trying branch (synced with master)
Co-authored-by: mpostma <postma.marin@protonmail.com>
By removing the hardcoded value the sentry client will fall back to pulling
it from the SENTRY_DSN environment variable. The hardcoded value has been
moved to the default value of the commandline options so the default
behavior will be the same.
A `--no-sentry` and `MEILI_NO_SENTRY` option has also been introduced
that effectively disables sentry reporting.
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.
I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").
This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
Removing the fields_count fetching reduced by 2 times the serach time, we should look at lazily pulling them form the criterions in needs
ugly-test: Make the fields_count fetching lazy
Just before running the exactness criterion
Add gh actions for cargo check using rust nightly
Add readme about actions workflows
Add basic Dockerfile
Add action workflow for docker publish
Change check action to test action
Update workflow readme without rust nightly
Rename test action file
Add gh actions to push latest docker image from master
Update github action for publish docker image
Add 2 steps dockerfile based on alpine
Update readme badges to match new CI
First, thank you for contributing to MeiliSearch! The goal of this document is to provide everything you need to start contributing to MeiliSearch.
- [Assumptions](#assumptions)
- [How to Contribute](#how-to-contribute)
- [Development Workflow](#development-workflow)
- [Git Guidelines](#git-guidelines)
## Assumptions
1.**You're familiar with [Github](https://github.com) and the [Pull Requests](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)(PR) workflow.**
2.**You've read the MeiliSearch [documentation](https://docs.meilisearch.com).**
3. **You know about the [MeiliSearch community](https://docs.meilisearch.com/learn/what_is_meilisearch/contact.html).
Please use this for help.**
## How to Contribute
1. Ensure your change has an issue! Find an
[existing issue](https://github.com/meilisearch/meilisearch/issues/) or [open a new issue](https://github.com/meilisearch/meilisearch/issues/new).
* This is where you can get a feel if the change will be accepted or not.
2. Once approved, [fork the MeiliSearch repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) in your own Github account.
3. [Create a new Git branch](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-and-deleting-branches-within-your-repository)
4. Review the [Development Workflow](#development-workflow) section that describes the steps to maintain the repository.
5. Make your changes on your branch.
6. [Submit the branch as a Pull Request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) pointing to the `main` branch of the MeiliSearch repository. A maintainer should comment and/or review your Pull Request within a few days. Although depending on the circumstances, it may take longer.
## Development Workflow
### Setup and run MeiliSearch
```bash
cargo run --release
```
We recommend using the `--release` flag to test the full performance of MeiliSearch.
### Test
```bash
cargo test
```
If you get a "Too many open files" error you might want to increase the open file limit using this command:
```bash
ulimit -Sn 3000
```
## Git Guidelines
### Git Branches
All changes must be made in a branch and submitted as PR.
We do not enforce any branch naming style, but please use something descriptive of your changes.
### Git Commits
As minimal requirements, your commit message should:
- be capitalized
- not finish by a dot or any other punctuation character (!,?)
- start with a verb so that we can read your commit message this way: "This commit will ...", where "..." is the commit message.
e.g.: "Fix the home page button" or "Add more tests for create_index method"
We don't follow any other convention, but if you want to use one, we recommend [the Chris Beams one](https://chris.beams.io/posts/git-commit/).
### Github Pull Requests
Some notes on GitHub PRs:
- All PRs must be reviewed and approved by at least one maintainer.
- The PR title should be accurate and descriptive of the changes.
- [Convert your PR as a draft](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request) if your changes are a work in progress: no one will review it until you pass your PR as ready for review.<br>
The draft PRs are recommended when you want to show that you are working on something and make your work visible.
- The branch related to the PR must be **up-to-date with `main`** before merging. Fortunately, this project uses [Bors](https://github.com/bors-ng/bors-ng) to automatically enforce this requirement without the PR author having to rebase manually.
<hr>
Thank you again for reading this through, we can not wait to begin to work with you if you made your way through this contributing guide ❤️
The Software is provided to you by the Licensor under the License, as defined below, subject to the following condition.
Copyright (c) 2019-2021 Meili SAS
Without limiting other conditions in the License, the grant of rights under the License will not include, and the License does not grant to you, the right to Sell the Software.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
For purposes of the foregoing, “Sell” means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other consideration (including without limitation fees for hosting or consulting/ support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software. Any license notice or attribution required by the License must also include this Commons Clause License Condition notice.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
Software: MeiliDB
License: MIT
Licensor: MEILI SAS
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- Provides [6 default ranking criteria](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/criterion/mod.rs#L107-L113) used to [bucket sort](https://en.wikipedia.org/wiki/Bucket_sort) documents
- Accepts [custom criteria](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/criterion/mod.rs#L24-L33) and can apply them in any custom order
- Support [ranged queries](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/query_builder.rs#L283), useful for paginating results
- Can [distinct](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/query_builder.rs#L265-L270) and [filter](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/query_builder.rs#L246-L259) returned documents based on context defined rules
- Searches for [concatenated](https://github.com/meilisearch/MeiliDB/pull/164) and [splitted query words](https://github.com/meilisearch/MeiliDB/pull/232) to improve the search quality.
- Can store complete documents or only [user schema specified fields](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-schema/src/lib.rs#L265-L279)
- The [default tokenizer](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-tokenizer/src/lib.rs) can index latin and kanji based languages
- Returns [the matching text areas](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/lib.rs#L66-L88), useful to highlight matched words in results
- Accepts query time search config like the [searchable attributes](https://github.com/meilisearch/MeiliDB/blob/dc5c42821e1340e96cb90a3da472264624a26326/meilidb-core/src/query_builder.rs#L272-L275)
<palign="center">⚡ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine 🔍</p>
**MeiliSearch** is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box.
For more information about features go to [our documentation](https://docs.meilisearch.com/).
It uses [LMDB](https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is [data oriented](https://github.com/meilisearch/MeiliDB/issues/82) and provides great performances.
You can [read the deep dive](deep-dive.md) if you want more information on the engine, it describes the whole process of generating updates and handling queries or you can take a look at the [typos and ranking rules](typos-ranking-rules.md) if you want to know the default rules used to sort the documents.
## Getting started
We will be proud if you submit issues and pull requests. You can help to grow this project and start contributing by checking [issues tagged "good-first-issue"](https://github.com/meilisearch/MeiliDB/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). It is a good start!
### Deploy the Server
The project is only a library yet. It means that there is no binary provided yet. To get started, you can check the examples wich are made to work with the data located in the `datasets/` folder.
MeiliDB will be a binary in a near future so you will be able to use it as a database out-of-the-box. We should be able to query it using HTTP. This is our current goal, [see the milestones](https://github.com/meilisearch/MeiliDB/milestones). In the end, the binary will be a bunch of network protocols and wrappers around the library - which will also be published on [crates.io](https://crates.io). Both the binary and the library will follow the same update cycle.
## Performances
With a database composed of _100 353_ documents with _352_ attributes each and _3_ of them indexed.
So more than _300 000_ fields indexed for _35 million_ stored we can handle more than _2.8k req/sec_ with an average response time of _9 ms_ on an Intel i7-7700 (8) @ 4.2GHz.
Requests are made using [wrk](https://github.com/wg/wrk) and scripted to simulate real users queries.
```
Running 10s test @ http://localhost:2230
2 threads and 25 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.52ms 7.61ms 99.25ms 84.58%
Req/Sec 1.41k 119.11 1.78k 64.50%
28080 requests in 10.01s, 7.42MB read
Requests/sec: 2806.46
Transfer/sec: 759.17KB
```
### Notes
With Rust 1.32 the allocator has been [changed to use the system allocator](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default).
We have seen much better performances when [using jemalloc as the global allocator](https://github.com/alexcrichton/jemallocator#documentation).
## Usage and examples
Currently MeiliDB do not provide an http server but you can run the example binary.
The _index_ subcommand has been made to create an index and inject documents into it. Using the command line below, the index will be named _movies_ and the _19 700_ movies of the `datasets/` will be injected in MeiliDB.
#### Homebrew (Mac OS)
```bash
cargo run --release --example from_file -- \
index example.mdb datasets/movies/data.csv \
--schema datasets/movies/schema.toml
brew update && brew install meilisearch
meilisearch
```
Once the first command is done, you can query the freshly created _movies_ index using the _search_ subcomand. In this example we filtered the dataset to only show _non-adult_ movies using the non-definitive `!adult` syntax filter.
#### Docker
```bash
cargo run --release --example from_file -- \
search example.mdb
--number 4\
--filter '!adult'\
id popularity adult original_title
docker run -p 7700:7700 -v "$(pwd)/data.ms:/data.ms" getmeili/meilisearch
Let's create an index! If you need a sample dataset, use [this movie database](https://www.notion.so/meilisearch/A-movies-dataset-to-test-Meili-1cbf7c9cfa4247249c40edfa22d7ca87#b5ae399b81834705ba5420ac70358a65). You can also find it in the `datasets/` directory.
```bash
curl -L 'https://bit.ly/2PAcw9l' -o movies.json
```
Now, you're ready to index some data.
```bash
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents'\
--header 'content-type: application/json'\
--data-binary @movies.json
```
### Search for Documents
#### In command line
The search engine is now aware of your documents and can serve those via an HTTP server.
The [`jq` command-line tool](https://stedolan.github.io/jq/) can greatly help you read the server responses.
"overview":"Along with crime-fighting partner Robin and new recruit Batgirl, Batman battles the dual threat of frosty genius Mr. Freeze and homicidal horticulturalist Poison Ivy. Freeze plans to put Gotham City on ice, while Ivy tries to drive a wedge between the dynamic duo.",
"overview":"Adam West and Burt Ward returns to their iconic roles of Batman and Robin. Featuring the voices of Adam West, Burt Ward, and Julie Newmar, the film sees the superheroes going up against classic villains like The Joker, The Riddler, The Penguin and Catwoman, both in Gotham City… and in space.",
"release_date":1475888400
}
],
"nbHits":8,
"exhaustiveNbHits":false,
"query":"botman robin",
"limit":2,
"offset":0,
"processingTimeMs":2
}
```
#### Use the Web Interface
We also deliver an **out-of-the-box [web interface](https://github.com/meilisearch/mini-dashboard)** in which you can test MeiliSearch interactively.
You can access the web interface in your web browser at the root of the server. The default URL is [http://127.0.0.1:7700](http://127.0.0.1:7700). All you need to do is open your web browser and enter MeiliSearch’s address to visit it. This will lead you to a web page with a search bar that will allow you to search in the selected index.
| [See the gif above](#demo)
## Documentation
Now that your MeiliSearch server is up and running, you can learn more about how to tune your search engine in [the documentation](https://docs.meilisearch.com).
## Contributing
Hey! We're glad you're thinking about contributing to MeiliSearch! Feel free to pick an [issue labeled as `good first issue`](https://github.com/meilisearch/MeiliSearch/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and to ask any question you need. Some points might not be clear and we are available to help you!
Also, we recommend following the [CONTRIBUTING](./CONTRIBUTING.md) to create your PR.
## Core engine and tokenizer
The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API.
Search and indexation are the domain of our core engine, [`milli`](https://github.com/meilisearch/milli), while tokenization is handled by [our `tokenizer` library](https://github.com/meilisearch/tokenizer/).
## Telemetry
MeiliSearch collects anonymous data regarding general usage.
This helps us better understand developers' usage of MeiliSearch features.
To find out more on what information we're retrieving, please see our documentation on [Telemetry](https://docs.meilisearch.com/learn/what_is_meilisearch/telemetry.html).
This program is optional, you can disable these analytics by using the `MEILI_NO_ANALYTICS` env variable.
## Feature request
The feature requests are not managed in this repository. Please visit our [dedicated repository](https://github.com/meilisearch/product) to see our work about the MeiliSearch product.
If you have a feature request or any feedback about an existing feature, please open [a discussion](https://github.com/meilisearch/product/discussions).
Also, feel free to participate in the current discussions, we are looking forward to reading your comments.
MeiliSearch is developed by [Meili](https://www.meilisearch.com), a young company. To know more about us, you can [read our blog](https://blog.meilisearch.com). Any suggestion or feedback is highly appreciated. Thank you for your support!
MeiliSearch takes the security of our software products and services seriously.
If you believe you have found a security vulnerability in any MeiliSearch-owned repository, please report it to us as described below.
## Suported versions
As long as we are pre-v1.0, only the latest version of MeiliSearch will be supported with security updates.
## Reporting security issues
⚠️ Please do not report security vulnerabilities through public GitHub issues. ⚠️
Instead, please kindly email us at security@meilisearch.com
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
- Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
- Full paths of source file(s) related to the manifestation of the issue
- The location of the affected source code (tag/branch/commit or direct URL)
- Any special configuration required to reproduce the issue
- Step-by-step instructions to reproduce the issue
- Proof-of-concept or exploit code (if possible)
- Impact of the issue, including how an attacker might exploit the issue
This information will help us triage your report more quickly.
You will receive a response from us within 72 hours. If the issue is confirmed, we will release a patch as soon as possible depending on complexity.
MeiliDB is a full text search engine based on a final state transducer named [fst](https://github.com/BurntSushi/fst) and a key-value store named [sled](https://github.com/spacejam/sled). The goal of a search engine is to store data and to respond to queries as accurate and fast as possible. To achieve this it must save the matching words in an [inverted index](https://en.wikipedia.org/wiki/Inverted_index).
<!-- MarkdownTOC autolink="true" -->
- [Where is the data stored?](#where-is-the-data-stored)
- [What does the key-value store contains?](#what-does-the-key-value-store-contains)
- [The inverted word index](#the-inverted-word-index)
- [A final state transducer](#a-final-state-transducer)
- [Document indexes](#document-indexes)
- [The schema](#the-schema)
- [Document attributes](#document-attributes)
- [How is a request processed?](#how-is-a-request-processed)
- [Query lexemes](#query-lexemes)
- [Automatons and query index](#automatons-and-query-index)
- [Sort by criteria](#sort-by-criteria)
<!-- /MarkdownTOC -->
## Where is the data stored?
MeiliDB is entirely backed by a key-value store like any good database (i.e. Postgres, MySQL). This brings a great flexibility in the way documents can be stored and updates handled along time.
[sled will brings some](https://github.com/spacejam/sled/tree/434533332a3f485e6d2e467023be0a0b55d3a1af#plans) of the [A.C.I.D. properties](https://en.wikipedia.org/wiki/ACID_(computer_science)) to help us be sure the saved data is consistent.
## What does the key-value store contains?
It contain the inverted word index, the schema and the documents fields.
### The inverted word index
[The inverted word index](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/database/words_index.rs) is a sled Tree dedicated to store and give access to all documents that contains a specific word. The information stored under the word is simply a big ordered array of where in the document the word has been found. In other word, a big list of [`DocIndex`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/lib.rs#L35-L51).
#### A final state transducer
_...also abbreviated fst_
This is the first entry point of the engine, you can read more about how it work with the beautiful blog post of @BurntSushi, [Index 1,600,000,000 Keys with Automata and Rust](https://blog.burntsushi.net/transducers/).
To make it short it is a powerful way to store all the words that are present in the indexed documents. You construct it by giving it all the words you want to index. When you want to search in it you can provide any automaton you want, in MeiliDB [a custom levenshtein automaton](https://github.com/tantivy-search/levenshtein-automata/) is used.
#### Document indexes
The `fst` will only return the words that match with the search automaton but the goal of the search engine is to retrieve all matches in all the documents when a query is made. You want it to return some sort of position in an attribute in a document, an information about where the given word matched.
To make it possible we retrieve all of the `DocIndex` corresponding to all the matching words in the fst, we use the [`WordsIndex`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/database/words_index.rs#L11-L21) Tree to get the `DocIndexes` corresponding the words.
### The schema
The schema is a data structure that represents which documents attributes should be stored and which should be indexed. It is stored under a the [`MainIndex`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/database/main_index.rs#L12) Tree and given to MeiliDB only at the creation of an index.
Each document attribute is associated to a unique 16 bit number named [`SchemaAttr`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/schema.rs#L186).
In the future, this schema type could be given along with updates, the database could be able to handled a new schema and reindex the database according to the new one.
### Document attributes
When the engine handle a query the result that the requester want is a document, not only the [`Matches`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/lib.rs#L62-L88) associated to it, fields of the original document must be returned too.
So MeiliDB again uses the power of the underlying key-value store and save the documents attributes marked as _STORE_ in the schema. The dedicated Tree for this information is the [`DocumentsIndex`](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/database/documents_index.rs#L11).
When a document field is saved in the key-value store its value is binary encoded using [message pack](https://github.com/3Hren/msgpack-rust), so a document must be serializable using serde.
## How is a request processed?
Now that we have our inverted index we are able to return results based on a query. In the MeiliDB universe a query is a simple string containing words.
### Query lexemes
The first step to be able to call the underlying structures is to split the query in words, for that we use a [custom tokenizer](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-tokenizer/src/lib.rs#L82-L84). Note that a tokenizer is specialized for a human language, this is the hard part.
### Automatons and query index
So to query the fst we need an automaton, in MeiliDB we use a [levenshtein automaton](https://en.wikipedia.org/wiki/Levenshtein_automaton), this automaton is constructed using a string and a maximum distance. According to the [Algolia's blog post](https://blog.algolia.com/inside-the-algolia-engine-part-3-query-processing/#algolia%e2%80%99s-way-of-searching-for-alternatives) we [created the DFAs](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/automaton.rs#L59-L78) with different settings.
Thanks to the power of the fst library [it is possible to union multiple automatons](https://docs.rs/fst/0.3.2/fst/map/struct.OpBuilder.html#method.union) on the same fst set. The `Stream` is able to return all the matching words. We use these words to find the whole list of `DocIndexes` associated.
With all these informations it is possible [to reconstruct a list of all the `DocIndexes` associated](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/query_builder.rs#L103-L130) with the words queried.
### Sort by criteria
Now that we are able to get a big list of [DocIndexes](https://github.com/Kerollmops/MeiliDB/blob/550dc1e99224e386516877450320f694947332d4/src/lib.rs#L21-L36) it is not enough to sort them by criteria, we need more informations like the levenshtein distance or the fact that a query word match exactly the word stored in the fst. So [we stuff it a little bit](https://github.com/Kerollmops/MeiliDB/blob/550dc1e99224e386516877450320f694947332d4/src/rank/query_builder.rs#L86-L93), and aggregate all these [Matches](https://github.com/Kerollmops/MeiliDB/blob/550dc1e99224e386516877450320f694947332d4/src/lib.rs#L47-L74) for each document. This way it will be easy to sort a simple vector of document using a bunch of functions.
With this big list of documents and associated matches [we are able to sort only the part of the slice that we want](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/query_builder.rs#L160-L188) using bucket sorting. [Each criterion](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-core/src/criterion/mod.rs#L95-L101) is evaluated on each subslice without copy, thanks to [GroupByMut](https://docs.rs/slice-group-by/0.2.4/slice_group_by/) which, I hope [will soon be merged](https://github.com/rust-lang/rfcs/pull/2477).
Note that it is possible to customize the criteria used by using the `QueryBuilder::with_criteria` constructor, this way you can implement some custom ranking based on the document attributes using the appropriate structure and the [`document` method](https://github.com/meilisearch/MeiliDB/blob/3db823de002243004612e36a19b4578d800dab97/meilidb-data/src/database/index.rs#L86).
"description": "En ce début de trentième millénaire, l'Empire n'a jamais été aussi puissant, aussi étendu à travers toute la galaxie. C'est dans sa capitale, Trantor, que l'éminent savant Hari Seldon invente la psychohistoire, une science toute nouvelle, à base de psychologie et de mathématiques, qui lui permet de prédire l'avenir... C'est-à-dire l'effondrement de l'Empire d'ici cinq siècles et au-delà, trente mille années de chaos et de ténèbres. Pour empêcher cette catastrophe et sauver la civilisation, Seldon crée la Fondation."
result_expected.insert("description".to_string(),Value::String("En ce début de trentième millénaire, l'Empire n'a jamais été aussi puissant, aussi étendu à travers toute la galaxie. C'est dans sa capitale, Trantor, que l'éminent savant Hari Seldon invente la psychohistoire, une science toute nouvelle, à base de psychologie et de mathématiques, qui lui permet de prédire l'avenir... C'est-à-dire l'effondrement de l'Empire d'ici cinq siècles et au-delà, trente mille années de chaos et de ténèbres. Pour empêcher cette catastrophe et sauver la civilisation, Seldon crée la <em>Fondation</em>.".to_string()));
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.