Commit Graph

891 Commits

Author SHA1 Message Date
ad hoc
e8f06f6c06 extract exact_word_prefix_docids 2022-04-04 20:54:03 +02:00
ad hoc
6dd2e4ffbd introduce exact_word_prefix database in index 2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8 refactor WordPrefixDocids to take dbs instead of indexes 2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352 query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5 extract exact word docids 2022-04-04 20:54:02 +02:00
ad hoc
5451c64d5d increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0 introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb introduce exact attribute setting 2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0 add test for exact words 2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7 disable typos on exact words 2022-04-04 20:54:01 +02:00
ad hoc
3e67d8818c fix typo in test comment 2022-04-04 20:34:23 +02:00
ad hoc
284d8a24e0 add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
ad hoc
30a2711bac rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c fmt 2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e fix bad rebase bug 2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d add test for exact words 2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065 disable typos on exact words 2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee add exact words setting 2022-04-04 20:10:54 +02:00
ad hoc
853b4a520f fmt 2022-04-04 10:41:46 +02:00
ad hoc
2cb71dff4a add typo integration tests 2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2 implement Copy on Setting 2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2 replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4 refactor typos for readability 2022-04-04 10:41:46 +02:00
ad hoc
66020cd923 rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
ad hoc
4c4b336ecb rename min word len for typo error 2022-04-01 11:17:03 +02:00
ad hoc
286dd7b2e4 rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c add tests for min_word_len_for_typo 2022-04-01 11:17:02 +02:00
ad hoc
9102de5500 fix error message 2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9 dynamic minimum word len for typos in query tree builder 2022-04-01 11:17:02 +02:00
ad hoc
5a24e60572 introduce word len for typo setting 2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960 add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080 fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
ad hoc
3e34981d9b add test for authorize_typos in update 2022-03-31 14:12:00 +02:00
ad hoc
6ef3bb9d83 fmt 2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062 add authorize_typo_test 2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd add authorize typo setting 2022-03-31 10:05:44 +02:00
Clémentine Urquizar
ddf78a735b Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
Irevoire
86dd88698d bump tokenizer 2022-03-23 14:25:58 +01:00
Irevoire
5dc464b9a7 rollback meilisearch-tokenizer version 2022-03-21 17:29:10 +01:00
bors[bot]
90276d9a2d Merge #472
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish

Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
ManyTheFish
49d59d88c2 Remove useless variables in proximity 2022-03-16 16:12:52 +01:00
Bruno Casali
adc71742c8 Move string concat to the struct instead of in the calling 2022-03-16 10:26:12 -03:00
Bruno Casali
4822fe1beb Add a better error message when the filterable attrs are empty
Fixes https://github.com/meilisearch/meilisearch/issues/2140
2022-03-15 18:13:59 -03:00
bors[bot]
f04ab67083 Merge #466
466: Bump version to 0.23.1 r=curquiza a=Kerollmops

This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-03-15 17:19:05 +00:00
bors[bot]
ad4c982c68 Merge #439
439: Optimize typo criterion r=Kerollmops a=MarinPostma

This pr implements a couple of optimization for the typo criterion:

- clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words.

- Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word.

## benches
```
group                                                                                                    main                                   typo
-----                                                                                                    ----                                   ----
smol-songs.csv: asc + default/Notstandskomitee                                                           2.51      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles                                                                    2.48      3.0±0.01ms        ? ?/sec    1.00   1190.9±1.29µs        ? ?/sec
smol-songs.csv: asc + default/charles mingus                                                             5.56     10.8±0.01ms        ? ?/sec    1.00   1935.3±1.00µs        ? ?/sec
smol-songs.csv: asc + default/david                                                                      1.65      3.9±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
smol-songs.csv: asc + default/david bowie                                                                3.34     12.5±0.02ms        ? ?/sec    1.00      3.7±0.00ms        ? ?/sec
smol-songs.csv: asc + default/john                                                                       1.00   1849.7±3.74µs        ? ?/sec    1.01   1875.1±4.65µs        ? ?/sec
smol-songs.csv: asc + default/marcus miller                                                              4.32     15.7±0.01ms        ? ?/sec    1.00      3.6±0.01ms        ? ?/sec
smol-songs.csv: asc + default/michael jackson                                                            3.31     12.5±0.01ms        ? ?/sec    1.00      3.8±0.00ms        ? ?/sec
smol-songs.csv: asc + default/tamo                                                                       1.05    565.4±0.86µs        ? ?/sec    1.00    539.3±1.22µs        ? ?/sec
smol-songs.csv: asc + default/thelonious monk                                                            3.49     11.5±0.01ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: asc/Notstandskomitee                                                                     2.59      5.6±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-songs.csv: asc/charles                                                                              6.05      2.1±0.00ms        ? ?/sec    1.00    347.8±0.60µs        ? ?/sec
smol-songs.csv: asc/charles mingus                                                                       14.46     9.4±0.01ms        ? ?/sec    1.00    649.2±0.97µs        ? ?/sec
smol-songs.csv: asc/david                                                                                3.87      2.4±0.00ms        ? ?/sec    1.00    618.2±0.69µs        ? ?/sec
smol-songs.csv: asc/david bowie                                                                          10.14     9.8±0.01ms        ? ?/sec    1.00    970.8±1.55µs        ? ?/sec
smol-songs.csv: asc/john                                                                                 1.00    546.5±1.10µs        ? ?/sec    1.00    547.1±2.11µs        ? ?/sec
smol-songs.csv: asc/marcus miller                                                                        11.45    10.4±0.06ms        ? ?/sec    1.00    907.9±1.37µs        ? ?/sec
smol-songs.csv: asc/michael jackson                                                                      10.56     9.7±0.01ms        ? ?/sec    1.00    919.6±1.03µs        ? ?/sec
smol-songs.csv: asc/tamo                                                                                 1.03     43.3±0.18µs        ? ?/sec    1.00     42.2±0.23µs        ? ?/sec
smol-songs.csv: asc/thelonious monk                                                                      4.16     10.7±0.02ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee                                                        1.00     95.7±0.20µs        ? ?/sec    1.15   109.6±10.40µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles                                                                 1.00     27.8±0.15µs        ? ?/sec    1.01     27.9±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus                                                          1.72    119.2±0.67µs        ? ?/sec    1.00     69.1±0.13µs        ? ?/sec
smol-songs.csv: basic filter: <=/david                                                                   1.00     22.3±0.33µs        ? ?/sec    1.05     23.4±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/david bowie                                                             1.59     86.9±0.79µs        ? ?/sec    1.00     54.5±0.31µs        ? ?/sec
smol-songs.csv: basic filter: <=/john                                                                    1.00     17.9±0.06µs        ? ?/sec    1.06     18.9±0.15µs        ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller                                                           1.65    102.7±1.63µs        ? ?/sec    1.00     62.3±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson                                                         1.76    128.2±1.85µs        ? ?/sec    1.00     72.9±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/tamo                                                                    1.00     17.9±0.13µs        ? ?/sec    1.05     18.7±0.20µs        ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk                                                         1.53    157.5±2.38µs        ? ?/sec    1.00    102.8±0.88µs        ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee                                                        1.00    100.9±4.36µs        ? ?/sec    1.04    105.0±8.25µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles                                                                 1.00     28.4±0.36µs        ? ?/sec    1.03     29.4±0.33µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus                                                          1.71    118.1±1.08µs        ? ?/sec    1.00     68.9±0.26µs        ? ?/sec
smol-songs.csv: basic filter: TO/david                                                                   1.00     24.0±0.26µs        ? ?/sec    1.03     24.6±0.43µs        ? ?/sec
smol-songs.csv: basic filter: TO/david bowie                                                             1.72     95.2±0.30µs        ? ?/sec    1.00     55.2±0.14µs        ? ?/sec
smol-songs.csv: basic filter: TO/john                                                                    1.00     18.8±0.09µs        ? ?/sec    1.06     19.8±0.17µs        ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller                                                           1.61    102.4±1.65µs        ? ?/sec    1.00     63.4±0.24µs        ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson                                                         1.77    132.1±1.41µs        ? ?/sec    1.00     74.5±0.59µs        ? ?/sec
smol-songs.csv: basic filter: TO/tamo                                                                    1.00     18.2±0.14µs        ? ?/sec    1.05     19.2±0.46µs        ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk                                                         1.49    150.8±1.92µs        ? ?/sec    1.00    101.3±0.44µs        ? ?/sec
smol-songs.csv: basic placeholder/                                                                       1.00     27.3±0.07µs        ? ?/sec    1.03     28.0±0.05µs        ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee"                                                      1.00    122.4±0.17µs        ? ?/sec    1.03    125.6±0.16µs        ? ?/sec
smol-songs.csv: basic with quote/"charles"                                                               1.00     88.8±0.30µs        ? ?/sec    1.00     88.4±0.15µs        ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus"                                                      1.00    685.2±0.74µs        ? ?/sec    1.01    689.4±6.07µs        ? ?/sec
smol-songs.csv: basic with quote/"david"                                                                 1.00    161.6±0.42µs        ? ?/sec    1.01    162.6±0.17µs        ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie"                                                         1.00    731.7±0.73µs        ? ?/sec    1.02    743.1±0.77µs        ? ?/sec
smol-songs.csv: basic with quote/"john"                                                                  1.00    267.1±0.33µs        ? ?/sec    1.01    270.9±0.33µs        ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller"                                                       1.00    138.7±0.31µs        ? ?/sec    1.02    140.9±0.13µs        ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson"                                                     1.01    841.4±0.72µs        ? ?/sec    1.00    833.8±0.92µs        ? ?/sec
smol-songs.csv: basic with quote/"tamo"                                                                  1.01    189.2±0.26µs        ? ?/sec    1.00    188.2±0.71µs        ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk"                                                     1.00   1100.5±1.36µs        ? ?/sec    1.01   1111.7±2.17µs        ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee                                                     3.40      7.9±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
smol-songs.csv: basic without quote/charles                                                              2.57    494.4±0.89µs        ? ?/sec    1.00    192.5±0.18µs        ? ?/sec
smol-songs.csv: basic without quote/charles mingus                                                       1.29      2.8±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/david                                                                1.95    623.8±0.90µs        ? ?/sec    1.00    319.2±1.22µs        ? ?/sec
smol-songs.csv: basic without quote/david bowie                                                          1.12      5.9±0.00ms        ? ?/sec    1.00      5.2±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/john                                                                 1.24   1340.9±2.25µs        ? ?/sec    1.00   1084.7±7.76µs        ? ?/sec
smol-songs.csv: basic without quote/marcus miller                                                        7.97     14.6±0.01ms        ? ?/sec    1.00   1826.0±6.84µs        ? ?/sec
smol-songs.csv: basic without quote/michael jackson                                                      1.19      3.9±0.00ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/tamo                                                                 1.65    737.7±3.58µs        ? ?/sec    1.00    446.7±0.51µs        ? ?/sec
smol-songs.csv: basic without quote/thelonious monk                                                      1.16      4.5±0.02ms        ? ?/sec    1.00      3.9±0.04ms        ? ?/sec
smol-songs.csv: big filter/Notstandskomitee                                                              3.27      7.6±0.02ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: big filter/charles                                                                       8.26   1957.5±1.37µs        ? ?/sec    1.00    236.8±0.34µs        ? ?/sec
smol-songs.csv: big filter/charles mingus                                                                18.49    11.2±0.06ms        ? ?/sec    1.00    607.7±3.03µs        ? ?/sec
smol-songs.csv: big filter/david                                                                         3.78      2.4±0.00ms        ? ?/sec    1.00    622.8±0.80µs        ? ?/sec
smol-songs.csv: big filter/david bowie                                                                   9.00     12.0±0.01ms        ? ?/sec    1.00   1336.0±3.17µs        ? ?/sec
smol-songs.csv: big filter/john                                                                          1.00    554.2±0.95µs        ? ?/sec    1.01    560.4±0.79µs        ? ?/sec
smol-songs.csv: big filter/marcus miller                                                                 18.09    12.0±0.01ms        ? ?/sec    1.00    664.7±0.60µs        ? ?/sec
smol-songs.csv: big filter/michael jackson                                                               8.43     12.0±0.01ms        ? ?/sec    1.00   1421.6±1.37µs        ? ?/sec
smol-songs.csv: big filter/tamo                                                                          1.00     86.3±0.14µs        ? ?/sec    1.01     87.3±0.21µs        ? ?/sec
smol-songs.csv: big filter/thelonious monk                                                               5.55     14.3±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee                                                          2.52      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: desc + default/charles                                                                   3.04      2.7±0.01ms        ? ?/sec    1.00    893.4±1.08µs        ? ?/sec
smol-songs.csv: desc + default/charles mingus                                                            6.77     10.3±0.01ms        ? ?/sec    1.00   1520.8±1.90µs        ? ?/sec
smol-songs.csv: desc + default/david                                                                     1.39      5.7±0.00ms        ? ?/sec    1.00      4.1±0.00ms        ? ?/sec
smol-songs.csv: desc + default/david bowie                                                               2.34     15.8±0.02ms        ? ?/sec    1.00      6.7±0.01ms        ? ?/sec
smol-songs.csv: desc + default/john                                                                      1.00      2.5±0.00ms        ? ?/sec    1.02      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/marcus miller                                                             5.06     14.5±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: desc + default/michael jackson                                                           2.64     14.1±0.05ms        ? ?/sec    1.00      5.4±0.00ms        ? ?/sec
smol-songs.csv: desc + default/tamo                                                                      1.00    567.0±0.65µs        ? ?/sec    1.00    565.7±0.97µs        ? ?/sec
smol-songs.csv: desc + default/thelonious monk                                                           3.55     11.6±0.02ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: desc/Notstandskomitee                                                                    2.58      5.6±0.02ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
smol-songs.csv: desc/charles                                                                             6.04      2.1±0.00ms        ? ?/sec    1.00    348.1±0.57µs        ? ?/sec
smol-songs.csv: desc/charles mingus                                                                      14.51     9.4±0.01ms        ? ?/sec    1.00    646.7±0.99µs        ? ?/sec
smol-songs.csv: desc/david                                                                               3.86      2.4±0.00ms        ? ?/sec    1.00    620.7±2.46µs        ? ?/sec
smol-songs.csv: desc/david bowie                                                                         10.10     9.8±0.01ms        ? ?/sec    1.00    973.9±3.31µs        ? ?/sec
smol-songs.csv: desc/john                                                                                1.00    545.5±0.78µs        ? ?/sec    1.00    547.2±0.48µs        ? ?/sec
smol-songs.csv: desc/marcus miller                                                                       11.39    10.3±0.01ms        ? ?/sec    1.00    903.7±0.95µs        ? ?/sec
smol-songs.csv: desc/michael jackson                                                                     10.51     9.7±0.01ms        ? ?/sec    1.00    924.7±2.02µs        ? ?/sec
smol-songs.csv: desc/tamo                                                                                1.01     43.2±0.33µs        ? ?/sec    1.00     42.6±0.35µs        ? ?/sec
smol-songs.csv: desc/thelonious monk                                                                     4.19     10.8±0.03ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: prefix search/a                                                                          1.00   1008.7±1.00µs        ? ?/sec    1.00   1005.5±0.91µs        ? ?/sec
smol-songs.csv: prefix search/b                                                                          1.00    885.0±0.70µs        ? ?/sec    1.01    890.6±1.11µs        ? ?/sec
smol-songs.csv: prefix search/i                                                                          1.00   1051.8±1.25µs        ? ?/sec    1.00   1056.6±4.12µs        ? ?/sec
smol-songs.csv: prefix search/s                                                                          1.00    724.7±1.77µs        ? ?/sec    1.00    721.6±0.59µs        ? ?/sec
smol-songs.csv: prefix search/x                                                                          1.01    212.4±0.21µs        ? ?/sec    1.00    210.9±0.38µs        ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie                                             18.55    48.5±0.09ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus                                               8.41     56.7±0.45ms        ? ?/sec    1.00      6.7±0.05ms        ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights                                                    15.74    38.9±0.14ms        ? ?/sec    1.00      2.5±0.00ms        ? ?/sec
smol-songs.csv: proximity/black saint sinner lady                                                        11.82    40.1±0.13ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960                                                          6.90     26.1±0.13ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
smol-songs.csv: typo/Arethla Franklin                                                                    14.93     5.8±0.01ms        ? ?/sec    1.00    390.1±1.89µs        ? ?/sec
smol-songs.csv: typo/Disnaylande                                                                         3.18      7.3±0.01ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
smol-songs.csv: typo/dire straights                                                                      5.55     15.2±0.02ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-songs.csv: typo/fear of the duck                                                                    28.03    20.0±0.03ms        ? ?/sec    1.00    713.3±1.54µs        ? ?/sec
smol-songs.csv: typo/indochie                                                                            19.25  1851.4±2.38µs        ? ?/sec    1.00     96.2±0.13µs        ? ?/sec
smol-songs.csv: typo/indochien                                                                           14.66  1887.7±3.18µs        ? ?/sec    1.00    128.8±0.18µs        ? ?/sec
smol-songs.csv: typo/klub des loopers                                                                    37.73    18.0±0.02ms        ? ?/sec    1.00    476.7±0.73µs        ? ?/sec
smol-songs.csv: typo/michel depech                                                                       10.17     5.8±0.01ms        ? ?/sec    1.00    565.8±1.16µs        ? ?/sec
smol-songs.csv: typo/mongus                                                                              15.33  1897.4±3.44µs        ? ?/sec    1.00    123.8±0.13µs        ? ?/sec
smol-songs.csv: typo/stromal                                                                             14.63  1859.3±2.40µs        ? ?/sec    1.00    127.1±0.29µs        ? ?/sec
smol-songs.csv: typo/the white striper                                                                   10.83     9.4±0.01ms        ? ?/sec    1.00    866.0±0.98µs        ? ?/sec
smol-songs.csv: typo/thelonius monk                                                                      14.40     3.8±0.00ms        ? ?/sec    1.00    261.5±1.30µs        ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots                                     5.54     70.8±0.09ms        ? ?/sec    1.00     12.8±0.03ms        ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title         3.48    119.8±0.14ms        ? ?/sec    1.00     34.4±0.04ms        ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song                                          8.98     71.9±0.12ms        ? ?/sec    1.00      8.0±0.01ms        ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793                                                     11.88    37.4±0.07ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
smol-songs.csv: words/seven nation mummy                                                                 22.86    23.4±0.04ms        ? ?/sec    1.00   1024.8±1.57µs        ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo                             2.76    124.4±0.15ms        ? ?/sec    1.00     45.1±0.09ms        ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one     2.52    107.0±0.23ms        ? ?/sec    1.00     42.4±0.66ms        ? ?/sec

group                                                                                    main-wiki                              typo-wiki
-----                                                                                    ---------                              ---------
smol-wiki-articles.csv: basic placeholder/                                               1.02     13.7±0.02µs        ? ?/sec    1.00     13.4±0.03µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"film"                                          1.02    409.8±0.67µs        ? ?/sec    1.00    402.6±0.48µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"france"                                        1.00    325.9±0.91µs        ? ?/sec    1.00    326.4±0.49µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan"                                         1.00    218.4±0.26µs        ? ?/sec    1.01    220.5±0.20µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine"                                       1.00    143.0±0.12µs        ? ?/sec    1.04    148.8±0.21µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis"                                 1.00     11.7±0.06ms        ? ?/sec    1.00     11.8±0.01ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus"                                        1.00      4.4±0.03ms        ? ?/sec    1.00      4.4±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"                             1.00     43.5±0.08ms        ? ?/sec    1.01     43.8±0.06ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain"                                         1.00    137.3±0.35µs        ? ?/sec    1.05    144.4±0.23µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/film                                         1.00    125.3±0.30µs        ? ?/sec    1.06    133.1±0.37µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/france                                       1.21   1782.6±1.65µs        ? ?/sec    1.00   1477.0±1.39µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/japan                                        1.28   1363.9±0.80µs        ? ?/sec    1.00   1064.3±1.79µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/machine                                      1.73    760.3±0.81µs        ? ?/sec    1.00    439.6±0.75µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis                                  1.03     17.0±0.03ms        ? ?/sec    1.00     16.5±0.02ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus                                       1.07      5.3±0.01ms        ? ?/sec    1.00      5.0±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll                                1.01     63.9±0.18ms        ? ?/sec    1.00     63.0±0.07ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/spain                                        2.07    667.4±0.93µs        ? ?/sec    1.00    322.8±0.29µs        ? ?/sec
smol-wiki-articles.csv: prefix search/c                                                  1.00    343.1±0.47µs        ? ?/sec    1.00    344.0±0.34µs        ? ?/sec
smol-wiki-articles.csv: prefix search/g                                                  1.00    374.4±3.42µs        ? ?/sec    1.00    374.1±0.44µs        ? ?/sec
smol-wiki-articles.csv: prefix search/j                                                  1.00    359.9±0.31µs        ? ?/sec    1.00    361.2±0.79µs        ? ?/sec
smol-wiki-articles.csv: prefix search/q                                                  1.01    102.0±0.12µs        ? ?/sec    1.00    101.4±0.32µs        ? ?/sec
smol-wiki-articles.csv: prefix search/t                                                  1.00    536.7±1.39µs        ? ?/sec    1.00    534.3±0.84µs        ? ?/sec
smol-wiki-articles.csv: prefix search/x                                                  1.00    400.9±1.00µs        ? ?/sec    1.00    399.5±0.45µs        ? ?/sec
smol-wiki-articles.csv: proximity/april paris                                            3.86     14.4±0.01ms        ? ?/sec    1.00      3.7±0.01ms        ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine                                          12.98    10.4±0.01ms        ? ?/sec    1.00    803.5±1.13µs        ? ?/sec
smol-wiki-articles.csv: proximity/herald sings                                           1.00     12.7±0.06ms        ? ?/sec    5.29     67.1±0.09ms        ? ?/sec
smol-wiki-articles.csv: proximity/tea two                                                6.48   1452.1±2.78µs        ? ?/sec    1.00    224.1±0.38µs        ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande                                                 3.89      8.5±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/aritmetric                                                  3.78     10.3±0.01ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-wiki-articles.csv: typo/linax                                                       8.91   1426.7±0.97µs        ? ?/sec    1.00    160.1±0.18µs        ? ?/sec
smol-wiki-articles.csv: typo/migrosoft                                                   7.48   1417.3±5.84µs        ? ?/sec    1.00    189.5±0.88µs        ? ?/sec
smol-wiki-articles.csv: typo/nympalidea                                                  3.96      7.2±0.01ms        ? ?/sec    1.00   1810.1±2.03µs        ? ?/sec
smol-wiki-articles.csv: typo/phytogropher                                                3.71      7.2±0.01ms        ? ?/sec    1.00   1934.3±6.51µs        ? ?/sec
smol-wiki-articles.csv: typo/sisan                                                       6.44   1497.2±1.38µs        ? ?/sec    1.00    232.7±0.94µs        ? ?/sec
smol-wiki-articles.csv: typo/the fronce                                                  6.92      2.9±0.00ms        ? ?/sec    1.00    418.0±1.76µs        ? ?/sec
smol-wiki-articles.csv: words/Abraham machin                                             16.63    10.8±0.01ms        ? ?/sec    1.00    649.7±1.08µs        ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza                                       27.15    25.6±0.03ms        ? ?/sec    1.00    944.2±5.07µs        ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk                                26.87    40.7±0.05ms        ? ?/sec    1.00   1515.3±2.73µs        ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli                            11.99    48.8±0.10ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo     4.90    110.0±0.15ms        ? ?/sec    1.00     22.4±0.03ms        ? ?/sec

```

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:43:36 +00:00
ad hoc
3f24555c3d custom fst automatons 2022-03-15 17:38:35 +01:00
ad hoc
628c835a22 fix tests 2022-03-15 17:38:34 +01:00
bors[bot]
8efac33b53 Merge #467
467: optimize prefix database r=Kerollmops a=MarinPostma

This pr introduces two optimizations that greatly improve the speed of computing prefix databases.

- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in  `word_prefix_pair`, which caused an iteration over the whole database.

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:14:35 +00:00
ad hoc
d127c57f2d review edits 2022-03-15 17:12:48 +01:00