ad hoc 
							
						 
					 
					
						
						
							
						
						284d8a24e0 
					 
					
						
						
							
							add intergration test for disabled typon on word  
						
						
						
						
					 
					
						2022-04-04 20:15:51 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						30a2711bac 
					 
					
						
						
							
							rename serde module to serde_impl module  
						
						... 
						
						
						
						needed because of issues with rustfmt 
						
						
					 
					
						2022-04-04 20:10:55 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						0fd55db21c 
					 
					
						
						
							
							fmt  
						
						
						
						
					 
					
						2022-04-04 20:10:55 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						559e46be5e 
					 
					
						
						
							
							fix bad rebase bug  
						
						
						
						
					 
					
						2022-04-04 20:10:55 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						8b1e5d9c6d 
					 
					
						
						
							
							add test for exact words  
						
						
						
						
					 
					
						2022-04-04 20:10:55 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						774fa8f065 
					 
					
						
						
							
							disable typos on exact words  
						
						
						
						
					 
					
						2022-04-04 20:10:55 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						9bbffb8fee 
					 
					
						
						
							
							add exact words setting  
						
						
						
						
					 
					
						2022-04-04 20:10:54 +02:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						48a5ce7434 
					 
					
						
						
							
							Merge  #473  
						
						... 
						
						
						
						473: set minimum word len for typos r=MarinPostma a=MarinPostma
this PR allows the configuration on the minimum word length for typos.
The default values are the same as previously.
## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469  
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-04-04 17:53:14 +00:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						6bf9824fec 
					 
					
						
						
							
							Merge  #485  
						
						... 
						
						
						
						485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma
I found a bug while working on #473 . This pr fixes it and add the missing tests on word derivations.
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-04-04 17:17:53 +00:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						853b4a520f 
					 
					
						
						
							
							fmt  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						2cb71dff4a 
					 
					
						
						
							
							add typo integration tests  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						1941072bb2 
					 
					
						
						
							
							implement Copy on Setting  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						fdaf45aab2 
					 
					
						
						
							
							replace hardcoded value with constant in TestContext  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						950a740bd4 
					 
					
						
						
							
							refactor typos for readability  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						66020cd923 
					 
					
						
						
							
							rename min_word_len* to use plain letter numbers  
						
						
						
						
					 
					
						2022-04-04 10:41:46 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						4c4b336ecb 
					 
					
						
						
							
							rename min word len for typo error  
						
						
						
						
					 
					
						2022-04-01 11:17:03 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						286dd7b2e4 
					 
					
						
						
							
							rename min_word_len_2_typo  
						
						
						
						
					 
					
						2022-04-01 11:17:03 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						55af85db3c 
					 
					
						
						
							
							add tests for min_word_len_for_typo  
						
						
						
						
					 
					
						2022-04-01 11:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						9102de5500 
					 
					
						
						
							
							fix error message  
						
						
						
						
					 
					
						2022-04-01 11:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						a1a3a49bc9 
					 
					
						
						
							
							dynamic minimum word len for typos in query tree builder  
						
						
						
						
					 
					
						2022-04-01 11:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						5a24e60572 
					 
					
						
						
							
							introduce word len for typo setting  
						
						
						
						
					 
					
						2022-04-01 11:17:02 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						9fe40df960 
					 
					
						
						
							
							add word derivations tests  
						
						
						
						
					 
					
						2022-04-01 11:05:18 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						d5ddc6b080 
					 
					
						
						
							
							fix 2 typos word derivation  bug  
						
						
						
						
					 
					
						2022-04-01 10:51:22 +02:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						d2d930dd3f 
					 
					
						
						
							
							Merge  #469  
						
						... 
						
						
						
						469: add authorize typo setting r=Kerollmops a=MarinPostma
This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-03-31 15:18:08 +00:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						3e34981d9b 
					 
					
						
						
							
							add test for authorize_typos in update  
						
						
						
						
					 
					
						2022-03-31 14:12:00 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						6ef3bb9d83 
					 
					
						
						
							
							fmt  
						
						
						
						
					 
					
						2022-03-31 14:06:23 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						f782fe2062 
					 
					
						
						
							
							add authorize_typo_test  
						
						
						
						
					 
					
						2022-03-31 10:08:39 +02:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						c4653347fd 
					 
					
						
						
							
							add authorize typo setting  
						
						
						
						
					 
					
						2022-03-31 10:05:44 +02:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						d8dd357326 
					 
					
						
						
							
							Merge  #480  
						
						... 
						
						
						
						480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops
This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132 ) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.
Co-authored-by: Kerollmops <clement@meilisearch.com > 
						
						
					 
					
						2022-03-29 18:13:31 +00:00 
						 
				 
			
				
					
						
							
							
								Kerollmops 
							
						 
					 
					
						
						
							
						
						6a77c81a28 
					 
					
						
						
							
							Increase benchmarks (push) CI timeout  
						
						
						
						
					 
					
						2022-03-29 09:45:36 -07:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						e10c26e70d 
					 
					
						
						
							
							Merge  #479  
						
						... 
						
						
						
						479: Update version (v0.24.1) r=Kerollmops a=curquiza
From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com > 
						
						
					 
					
						2022-03-24 20:12:37 +00:00 
						 
				 
			
				
					
						
							
							
								Clémentine Urquizar 
							
						 
					 
					
						
						
							
						
						ddf78a735b 
					 
					
						
						
							
							Update version (v0.24.1)  
						
						
						
						
					 
					
						2022-03-24 16:39:45 +01:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						2c7cafbf20 
					 
					
						
						
							
							Merge  #475  
						
						... 
						
						
						
						475: Bump tokenizer r=Kerollmops a=irevoire
This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).
Co-authored-by: Irevoire <tamo@meilisearch.com > 
						
						
					 
					
						2022-03-23 13:26:44 +00:00 
						 
				 
			
				
					
						
							
							
								Irevoire 
							
						 
					 
					
						
						
							
						
						86dd88698d 
					 
					
						
						
							
							bump tokenizer  
						
						
						
						
					 
					
						2022-03-23 14:25:58 +01:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						b82f46e862 
					 
					
						
						
							
							Merge  #476  
						
						... 
						
						
						
						476: Rollback meilisearch-tokenizer version r=Kerollmops a=irevoire
Lindera often fails to download some data from google drive we can’t compile consistently meilisearch / milli.
We can’t bump to the latest version (that moved out of google drive) either because lindera uses reqwest with openssl with no way of configuring it our benchmarks were not able to run. The latter issue should be fixed by https://github.com/lindera-morphology/lindera/pull/164 .
Co-authored-by: Irevoire <tamo@meilisearch.com > 
						
						
					 
					
						2022-03-22 14:02:00 +00:00 
						 
				 
			
				
					
						
							
							
								Irevoire 
							
						 
					 
					
						
						
							
						
						5dc464b9a7 
					 
					
						
						
							
							rollback meilisearch-tokenizer version  
						
						
						
						
					 
					
						2022-03-21 17:29:10 +01:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						90276d9a2d 
					 
					
						
						
							
							Merge  #472  
						
						... 
						
						
						
						472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish
Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.
Co-authored-by: ManyTheFish <many@meilisearch.com > 
						
						
					 
					
						2022-03-16 15:33:11 +00:00 
						 
				 
			
				
					
						
							
							
								ManyTheFish 
							
						 
					 
					
						
						
							
						
						49d59d88c2 
					 
					
						
						
							
							Remove useless variables in proximity  
						
						
						
						
					 
					
						2022-03-16 16:12:52 +01:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						5863afa1a5 
					 
					
						
						
							
							Merge  #468  
						
						... 
						
						
						
						468: Add a new error message when the filterableAttributes are empty r=Kerollmops a=brunoocasali
Fixes https://github.com/meilisearch/meilisearch/issues/2140 
Is there a good way to reduce de duplication here? Maybe adding a shared function? I don't know the best and idiomatic way to do that, I appreciate any tip!
Another doubt is related to the duplication of the calling:
```rs
// filter.rs:373
FilterError::AttributeNotFilterable {
    attribute,
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
},
```
and
```rs
// filter.rs:424
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
    attribute: "_geo",
    filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
```
I think we could make the `filterable_fields.into_iter().collect::<Vec<_>>().join(" ")` directly into the error handling like the sortable error. I made it into the last commit, if this is something to avoid, let me know and I can remove it :)
Co-authored-by: Bruno Casali <brunoocasali@gmail.com > 
						
						
					 
					
						2022-03-16 15:02:19 +00:00 
						 
				 
			
				
					
						
							
							
								Bruno Casali 
							
						 
					 
					
						
						
							
						
						adc71742c8 
					 
					
						
						
							
							Move string concat to the struct instead of in the calling  
						
						
						
						
					 
					
						2022-03-16 10:26:12 -03:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						cb6b6915a4 
					 
					
						
						
							
							Merge  #470  
						
						... 
						
						
						
						470: Set the cargo crate resolver to v2 r=Kerollmops a=MarinPostma
This PR updates the workspace resolver to v2. This should fix [the benchmarks](https://github.com/meilisearch/milli/runs/5558347765?check_suite_focus=true#step:8:184 ).
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-03-16 10:55:22 +00:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						2a31cd13c9 
					 
					
						
						
							
							set resolver to v2  
						
						
						
						
					 
					
						2022-03-16 11:47:27 +01:00 
						 
				 
			
				
					
						
							
							
								Bruno Casali 
							
						 
					 
					
						
						
							
						
						4822fe1beb 
					 
					
						
						
							
							Add a better error message when the filterable attrs are empty  
						
						... 
						
						
						
						Fixes https://github.com/meilisearch/meilisearch/issues/2140  
						
						
					 
					
						2022-03-15 18:13:59 -03:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						f04ab67083 
					 
					
						
						
							
							Merge  #466  
						
						... 
						
						
						
						466: Bump version to 0.23.1 r=curquiza a=Kerollmops
This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release.
Co-authored-by: Kerollmops <clement@meilisearch.com > 
						
						
					 
					
						2022-03-15 17:19:05 +00:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						ad4c982c68 
					 
					
						
						
							
							Merge  #439  
						
						... 
						
						
						
						439: Optimize typo criterion r=Kerollmops a=MarinPostma
This pr implements a couple of optimization for the typo criterion:
- clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words.
- Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word.
## benches
```
group                                                                                                    main                                   typo
-----                                                                                                    ----                                   ----
smol-songs.csv: asc + default/Notstandskomitee                                                           2.51      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles                                                                    2.48      3.0±0.01ms        ? ?/sec    1.00   1190.9±1.29µs        ? ?/sec
smol-songs.csv: asc + default/charles mingus                                                             5.56     10.8±0.01ms        ? ?/sec    1.00   1935.3±1.00µs        ? ?/sec
smol-songs.csv: asc + default/david                                                                      1.65      3.9±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
smol-songs.csv: asc + default/david bowie                                                                3.34     12.5±0.02ms        ? ?/sec    1.00      3.7±0.00ms        ? ?/sec
smol-songs.csv: asc + default/john                                                                       1.00   1849.7±3.74µs        ? ?/sec    1.01   1875.1±4.65µs        ? ?/sec
smol-songs.csv: asc + default/marcus miller                                                              4.32     15.7±0.01ms        ? ?/sec    1.00      3.6±0.01ms        ? ?/sec
smol-songs.csv: asc + default/michael jackson                                                            3.31     12.5±0.01ms        ? ?/sec    1.00      3.8±0.00ms        ? ?/sec
smol-songs.csv: asc + default/tamo                                                                       1.05    565.4±0.86µs        ? ?/sec    1.00    539.3±1.22µs        ? ?/sec
smol-songs.csv: asc + default/thelonious monk                                                            3.49     11.5±0.01ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: asc/Notstandskomitee                                                                     2.59      5.6±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-songs.csv: asc/charles                                                                              6.05      2.1±0.00ms        ? ?/sec    1.00    347.8±0.60µs        ? ?/sec
smol-songs.csv: asc/charles mingus                                                                       14.46     9.4±0.01ms        ? ?/sec    1.00    649.2±0.97µs        ? ?/sec
smol-songs.csv: asc/david                                                                                3.87      2.4±0.00ms        ? ?/sec    1.00    618.2±0.69µs        ? ?/sec
smol-songs.csv: asc/david bowie                                                                          10.14     9.8±0.01ms        ? ?/sec    1.00    970.8±1.55µs        ? ?/sec
smol-songs.csv: asc/john                                                                                 1.00    546.5±1.10µs        ? ?/sec    1.00    547.1±2.11µs        ? ?/sec
smol-songs.csv: asc/marcus miller                                                                        11.45    10.4±0.06ms        ? ?/sec    1.00    907.9±1.37µs        ? ?/sec
smol-songs.csv: asc/michael jackson                                                                      10.56     9.7±0.01ms        ? ?/sec    1.00    919.6±1.03µs        ? ?/sec
smol-songs.csv: asc/tamo                                                                                 1.03     43.3±0.18µs        ? ?/sec    1.00     42.2±0.23µs        ? ?/sec
smol-songs.csv: asc/thelonious monk                                                                      4.16     10.7±0.02ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee                                                        1.00     95.7±0.20µs        ? ?/sec    1.15   109.6±10.40µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles                                                                 1.00     27.8±0.15µs        ? ?/sec    1.01     27.9±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus                                                          1.72    119.2±0.67µs        ? ?/sec    1.00     69.1±0.13µs        ? ?/sec
smol-songs.csv: basic filter: <=/david                                                                   1.00     22.3±0.33µs        ? ?/sec    1.05     23.4±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/david bowie                                                             1.59     86.9±0.79µs        ? ?/sec    1.00     54.5±0.31µs        ? ?/sec
smol-songs.csv: basic filter: <=/john                                                                    1.00     17.9±0.06µs        ? ?/sec    1.06     18.9±0.15µs        ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller                                                           1.65    102.7±1.63µs        ? ?/sec    1.00     62.3±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson                                                         1.76    128.2±1.85µs        ? ?/sec    1.00     72.9±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/tamo                                                                    1.00     17.9±0.13µs        ? ?/sec    1.05     18.7±0.20µs        ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk                                                         1.53    157.5±2.38µs        ? ?/sec    1.00    102.8±0.88µs        ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee                                                        1.00    100.9±4.36µs        ? ?/sec    1.04    105.0±8.25µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles                                                                 1.00     28.4±0.36µs        ? ?/sec    1.03     29.4±0.33µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus                                                          1.71    118.1±1.08µs        ? ?/sec    1.00     68.9±0.26µs        ? ?/sec
smol-songs.csv: basic filter: TO/david                                                                   1.00     24.0±0.26µs        ? ?/sec    1.03     24.6±0.43µs        ? ?/sec
smol-songs.csv: basic filter: TO/david bowie                                                             1.72     95.2±0.30µs        ? ?/sec    1.00     55.2±0.14µs        ? ?/sec
smol-songs.csv: basic filter: TO/john                                                                    1.00     18.8±0.09µs        ? ?/sec    1.06     19.8±0.17µs        ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller                                                           1.61    102.4±1.65µs        ? ?/sec    1.00     63.4±0.24µs        ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson                                                         1.77    132.1±1.41µs        ? ?/sec    1.00     74.5±0.59µs        ? ?/sec
smol-songs.csv: basic filter: TO/tamo                                                                    1.00     18.2±0.14µs        ? ?/sec    1.05     19.2±0.46µs        ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk                                                         1.49    150.8±1.92µs        ? ?/sec    1.00    101.3±0.44µs        ? ?/sec
smol-songs.csv: basic placeholder/                                                                       1.00     27.3±0.07µs        ? ?/sec    1.03     28.0±0.05µs        ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee"                                                      1.00    122.4±0.17µs        ? ?/sec    1.03    125.6±0.16µs        ? ?/sec
smol-songs.csv: basic with quote/"charles"                                                               1.00     88.8±0.30µs        ? ?/sec    1.00     88.4±0.15µs        ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus"                                                      1.00    685.2±0.74µs        ? ?/sec    1.01    689.4±6.07µs        ? ?/sec
smol-songs.csv: basic with quote/"david"                                                                 1.00    161.6±0.42µs        ? ?/sec    1.01    162.6±0.17µs        ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie"                                                         1.00    731.7±0.73µs        ? ?/sec    1.02    743.1±0.77µs        ? ?/sec
smol-songs.csv: basic with quote/"john"                                                                  1.00    267.1±0.33µs        ? ?/sec    1.01    270.9±0.33µs        ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller"                                                       1.00    138.7±0.31µs        ? ?/sec    1.02    140.9±0.13µs        ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson"                                                     1.01    841.4±0.72µs        ? ?/sec    1.00    833.8±0.92µs        ? ?/sec
smol-songs.csv: basic with quote/"tamo"                                                                  1.01    189.2±0.26µs        ? ?/sec    1.00    188.2±0.71µs        ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk"                                                     1.00   1100.5±1.36µs        ? ?/sec    1.01   1111.7±2.17µs        ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee                                                     3.40      7.9±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
smol-songs.csv: basic without quote/charles                                                              2.57    494.4±0.89µs        ? ?/sec    1.00    192.5±0.18µs        ? ?/sec
smol-songs.csv: basic without quote/charles mingus                                                       1.29      2.8±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/david                                                                1.95    623.8±0.90µs        ? ?/sec    1.00    319.2±1.22µs        ? ?/sec
smol-songs.csv: basic without quote/david bowie                                                          1.12      5.9±0.00ms        ? ?/sec    1.00      5.2±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/john                                                                 1.24   1340.9±2.25µs        ? ?/sec    1.00   1084.7±7.76µs        ? ?/sec
smol-songs.csv: basic without quote/marcus miller                                                        7.97     14.6±0.01ms        ? ?/sec    1.00   1826.0±6.84µs        ? ?/sec
smol-songs.csv: basic without quote/michael jackson                                                      1.19      3.9±0.00ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/tamo                                                                 1.65    737.7±3.58µs        ? ?/sec    1.00    446.7±0.51µs        ? ?/sec
smol-songs.csv: basic without quote/thelonious monk                                                      1.16      4.5±0.02ms        ? ?/sec    1.00      3.9±0.04ms        ? ?/sec
smol-songs.csv: big filter/Notstandskomitee                                                              3.27      7.6±0.02ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: big filter/charles                                                                       8.26   1957.5±1.37µs        ? ?/sec    1.00    236.8±0.34µs        ? ?/sec
smol-songs.csv: big filter/charles mingus                                                                18.49    11.2±0.06ms        ? ?/sec    1.00    607.7±3.03µs        ? ?/sec
smol-songs.csv: big filter/david                                                                         3.78      2.4±0.00ms        ? ?/sec    1.00    622.8±0.80µs        ? ?/sec
smol-songs.csv: big filter/david bowie                                                                   9.00     12.0±0.01ms        ? ?/sec    1.00   1336.0±3.17µs        ? ?/sec
smol-songs.csv: big filter/john                                                                          1.00    554.2±0.95µs        ? ?/sec    1.01    560.4±0.79µs        ? ?/sec
smol-songs.csv: big filter/marcus miller                                                                 18.09    12.0±0.01ms        ? ?/sec    1.00    664.7±0.60µs        ? ?/sec
smol-songs.csv: big filter/michael jackson                                                               8.43     12.0±0.01ms        ? ?/sec    1.00   1421.6±1.37µs        ? ?/sec
smol-songs.csv: big filter/tamo                                                                          1.00     86.3±0.14µs        ? ?/sec    1.01     87.3±0.21µs        ? ?/sec
smol-songs.csv: big filter/thelonious monk                                                               5.55     14.3±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee                                                          2.52      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: desc + default/charles                                                                   3.04      2.7±0.01ms        ? ?/sec    1.00    893.4±1.08µs        ? ?/sec
smol-songs.csv: desc + default/charles mingus                                                            6.77     10.3±0.01ms        ? ?/sec    1.00   1520.8±1.90µs        ? ?/sec
smol-songs.csv: desc + default/david                                                                     1.39      5.7±0.00ms        ? ?/sec    1.00      4.1±0.00ms        ? ?/sec
smol-songs.csv: desc + default/david bowie                                                               2.34     15.8±0.02ms        ? ?/sec    1.00      6.7±0.01ms        ? ?/sec
smol-songs.csv: desc + default/john                                                                      1.00      2.5±0.00ms        ? ?/sec    1.02      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/marcus miller                                                             5.06     14.5±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: desc + default/michael jackson                                                           2.64     14.1±0.05ms        ? ?/sec    1.00      5.4±0.00ms        ? ?/sec
smol-songs.csv: desc + default/tamo                                                                      1.00    567.0±0.65µs        ? ?/sec    1.00    565.7±0.97µs        ? ?/sec
smol-songs.csv: desc + default/thelonious monk                                                           3.55     11.6±0.02ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: desc/Notstandskomitee                                                                    2.58      5.6±0.02ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
smol-songs.csv: desc/charles                                                                             6.04      2.1±0.00ms        ? ?/sec    1.00    348.1±0.57µs        ? ?/sec
smol-songs.csv: desc/charles mingus                                                                      14.51     9.4±0.01ms        ? ?/sec    1.00    646.7±0.99µs        ? ?/sec
smol-songs.csv: desc/david                                                                               3.86      2.4±0.00ms        ? ?/sec    1.00    620.7±2.46µs        ? ?/sec
smol-songs.csv: desc/david bowie                                                                         10.10     9.8±0.01ms        ? ?/sec    1.00    973.9±3.31µs        ? ?/sec
smol-songs.csv: desc/john                                                                                1.00    545.5±0.78µs        ? ?/sec    1.00    547.2±0.48µs        ? ?/sec
smol-songs.csv: desc/marcus miller                                                                       11.39    10.3±0.01ms        ? ?/sec    1.00    903.7±0.95µs        ? ?/sec
smol-songs.csv: desc/michael jackson                                                                     10.51     9.7±0.01ms        ? ?/sec    1.00    924.7±2.02µs        ? ?/sec
smol-songs.csv: desc/tamo                                                                                1.01     43.2±0.33µs        ? ?/sec    1.00     42.6±0.35µs        ? ?/sec
smol-songs.csv: desc/thelonious monk                                                                     4.19     10.8±0.03ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: prefix search/a                                                                          1.00   1008.7±1.00µs        ? ?/sec    1.00   1005.5±0.91µs        ? ?/sec
smol-songs.csv: prefix search/b                                                                          1.00    885.0±0.70µs        ? ?/sec    1.01    890.6±1.11µs        ? ?/sec
smol-songs.csv: prefix search/i                                                                          1.00   1051.8±1.25µs        ? ?/sec    1.00   1056.6±4.12µs        ? ?/sec
smol-songs.csv: prefix search/s                                                                          1.00    724.7±1.77µs        ? ?/sec    1.00    721.6±0.59µs        ? ?/sec
smol-songs.csv: prefix search/x                                                                          1.01    212.4±0.21µs        ? ?/sec    1.00    210.9±0.38µs        ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie                                             18.55    48.5±0.09ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus                                               8.41     56.7±0.45ms        ? ?/sec    1.00      6.7±0.05ms        ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights                                                    15.74    38.9±0.14ms        ? ?/sec    1.00      2.5±0.00ms        ? ?/sec
smol-songs.csv: proximity/black saint sinner lady                                                        11.82    40.1±0.13ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960                                                          6.90     26.1±0.13ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
smol-songs.csv: typo/Arethla Franklin                                                                    14.93     5.8±0.01ms        ? ?/sec    1.00    390.1±1.89µs        ? ?/sec
smol-songs.csv: typo/Disnaylande                                                                         3.18      7.3±0.01ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
smol-songs.csv: typo/dire straights                                                                      5.55     15.2±0.02ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-songs.csv: typo/fear of the duck                                                                    28.03    20.0±0.03ms        ? ?/sec    1.00    713.3±1.54µs        ? ?/sec
smol-songs.csv: typo/indochie                                                                            19.25  1851.4±2.38µs        ? ?/sec    1.00     96.2±0.13µs        ? ?/sec
smol-songs.csv: typo/indochien                                                                           14.66  1887.7±3.18µs        ? ?/sec    1.00    128.8±0.18µs        ? ?/sec
smol-songs.csv: typo/klub des loopers                                                                    37.73    18.0±0.02ms        ? ?/sec    1.00    476.7±0.73µs        ? ?/sec
smol-songs.csv: typo/michel depech                                                                       10.17     5.8±0.01ms        ? ?/sec    1.00    565.8±1.16µs        ? ?/sec
smol-songs.csv: typo/mongus                                                                              15.33  1897.4±3.44µs        ? ?/sec    1.00    123.8±0.13µs        ? ?/sec
smol-songs.csv: typo/stromal                                                                             14.63  1859.3±2.40µs        ? ?/sec    1.00    127.1±0.29µs        ? ?/sec
smol-songs.csv: typo/the white striper                                                                   10.83     9.4±0.01ms        ? ?/sec    1.00    866.0±0.98µs        ? ?/sec
smol-songs.csv: typo/thelonius monk                                                                      14.40     3.8±0.00ms        ? ?/sec    1.00    261.5±1.30µs        ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots                                     5.54     70.8±0.09ms        ? ?/sec    1.00     12.8±0.03ms        ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title         3.48    119.8±0.14ms        ? ?/sec    1.00     34.4±0.04ms        ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song                                          8.98     71.9±0.12ms        ? ?/sec    1.00      8.0±0.01ms        ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793                                                     11.88    37.4±0.07ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
smol-songs.csv: words/seven nation mummy                                                                 22.86    23.4±0.04ms        ? ?/sec    1.00   1024.8±1.57µs        ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo                             2.76    124.4±0.15ms        ? ?/sec    1.00     45.1±0.09ms        ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one     2.52    107.0±0.23ms        ? ?/sec    1.00     42.4±0.66ms        ? ?/sec
group                                                                                    main-wiki                              typo-wiki
-----                                                                                    ---------                              ---------
smol-wiki-articles.csv: basic placeholder/                                               1.02     13.7±0.02µs        ? ?/sec    1.00     13.4±0.03µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"film"                                          1.02    409.8±0.67µs        ? ?/sec    1.00    402.6±0.48µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"france"                                        1.00    325.9±0.91µs        ? ?/sec    1.00    326.4±0.49µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan"                                         1.00    218.4±0.26µs        ? ?/sec    1.01    220.5±0.20µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine"                                       1.00    143.0±0.12µs        ? ?/sec    1.04    148.8±0.21µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis"                                 1.00     11.7±0.06ms        ? ?/sec    1.00     11.8±0.01ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus"                                        1.00      4.4±0.03ms        ? ?/sec    1.00      4.4±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"                             1.00     43.5±0.08ms        ? ?/sec    1.01     43.8±0.06ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain"                                         1.00    137.3±0.35µs        ? ?/sec    1.05    144.4±0.23µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/film                                         1.00    125.3±0.30µs        ? ?/sec    1.06    133.1±0.37µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/france                                       1.21   1782.6±1.65µs        ? ?/sec    1.00   1477.0±1.39µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/japan                                        1.28   1363.9±0.80µs        ? ?/sec    1.00   1064.3±1.79µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/machine                                      1.73    760.3±0.81µs        ? ?/sec    1.00    439.6±0.75µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis                                  1.03     17.0±0.03ms        ? ?/sec    1.00     16.5±0.02ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus                                       1.07      5.3±0.01ms        ? ?/sec    1.00      5.0±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll                                1.01     63.9±0.18ms        ? ?/sec    1.00     63.0±0.07ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/spain                                        2.07    667.4±0.93µs        ? ?/sec    1.00    322.8±0.29µs        ? ?/sec
smol-wiki-articles.csv: prefix search/c                                                  1.00    343.1±0.47µs        ? ?/sec    1.00    344.0±0.34µs        ? ?/sec
smol-wiki-articles.csv: prefix search/g                                                  1.00    374.4±3.42µs        ? ?/sec    1.00    374.1±0.44µs        ? ?/sec
smol-wiki-articles.csv: prefix search/j                                                  1.00    359.9±0.31µs        ? ?/sec    1.00    361.2±0.79µs        ? ?/sec
smol-wiki-articles.csv: prefix search/q                                                  1.01    102.0±0.12µs        ? ?/sec    1.00    101.4±0.32µs        ? ?/sec
smol-wiki-articles.csv: prefix search/t                                                  1.00    536.7±1.39µs        ? ?/sec    1.00    534.3±0.84µs        ? ?/sec
smol-wiki-articles.csv: prefix search/x                                                  1.00    400.9±1.00µs        ? ?/sec    1.00    399.5±0.45µs        ? ?/sec
smol-wiki-articles.csv: proximity/april paris                                            3.86     14.4±0.01ms        ? ?/sec    1.00      3.7±0.01ms        ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine                                          12.98    10.4±0.01ms        ? ?/sec    1.00    803.5±1.13µs        ? ?/sec
smol-wiki-articles.csv: proximity/herald sings                                           1.00     12.7±0.06ms        ? ?/sec    5.29     67.1±0.09ms        ? ?/sec
smol-wiki-articles.csv: proximity/tea two                                                6.48   1452.1±2.78µs        ? ?/sec    1.00    224.1±0.38µs        ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande                                                 3.89      8.5±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/aritmetric                                                  3.78     10.3±0.01ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-wiki-articles.csv: typo/linax                                                       8.91   1426.7±0.97µs        ? ?/sec    1.00    160.1±0.18µs        ? ?/sec
smol-wiki-articles.csv: typo/migrosoft                                                   7.48   1417.3±5.84µs        ? ?/sec    1.00    189.5±0.88µs        ? ?/sec
smol-wiki-articles.csv: typo/nympalidea                                                  3.96      7.2±0.01ms        ? ?/sec    1.00   1810.1±2.03µs        ? ?/sec
smol-wiki-articles.csv: typo/phytogropher                                                3.71      7.2±0.01ms        ? ?/sec    1.00   1934.3±6.51µs        ? ?/sec
smol-wiki-articles.csv: typo/sisan                                                       6.44   1497.2±1.38µs        ? ?/sec    1.00    232.7±0.94µs        ? ?/sec
smol-wiki-articles.csv: typo/the fronce                                                  6.92      2.9±0.00ms        ? ?/sec    1.00    418.0±1.76µs        ? ?/sec
smol-wiki-articles.csv: words/Abraham machin                                             16.63    10.8±0.01ms        ? ?/sec    1.00    649.7±1.08µs        ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza                                       27.15    25.6±0.03ms        ? ?/sec    1.00    944.2±5.07µs        ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk                                26.87    40.7±0.05ms        ? ?/sec    1.00   1515.3±2.73µs        ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli                            11.99    48.8±0.10ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo     4.90    110.0±0.15ms        ? ?/sec    1.00     22.4±0.03ms        ? ?/sec
```
Co-authored-by: mpostma <postma.marin@protonmail.com >
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-03-15 16:43:36 +00:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						3f24555c3d 
					 
					
						
						
							
							custom fst automatons  
						
						
						
						
					 
					
						2022-03-15 17:38:35 +01:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						628c835a22 
					 
					
						
						
							
							fix tests  
						
						
						
						
					 
					
						2022-03-15 17:38:34 +01:00 
						 
				 
			
				
					
						
							
							
								bors[bot] 
							
						 
					 
					
						
						
							
						
						8efac33b53 
					 
					
						
						
							
							Merge  #467  
						
						... 
						
						
						
						467: optimize prefix database r=Kerollmops a=MarinPostma
This pr introduces two optimizations that greatly improve the speed of computing prefix databases.
- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in  `word_prefix_pair`, which caused an iteration over the whole database.
Co-authored-by: ad hoc <postma.marin@protonmail.com > 
						
						
					 
					
						2022-03-15 16:14:35 +00:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						d127c57f2d 
					 
					
						
						
							
							review edits  
						
						
						
						
					 
					
						2022-03-15 17:12:48 +01:00 
						 
				 
			
				
					
						
							
							
								ad hoc 
							
						 
					 
					
						
						
							
						
						d633ac5b9d 
					 
					
						
						
							
							optimize word prefix pair  
						
						
						
						
					 
					
						2022-03-15 16:37:22 +01:00