Commit Graph

358 Commits

Author SHA1 Message Date
Clément Renault
ecd2b2f217 Make the final merge done in parallel 2020-08-07 15:44:04 +02:00
Clément Renault
91282c8b6a Move the documents into another file 2020-08-07 13:11:31 +02:00
Clément Renault
fae694a102 Put the documents into an MTBL database 2020-08-07 12:14:40 +02:00
Clément Renault
405a71d3a4 Accept csv from stdin 2020-08-06 13:38:21 +02:00
Clément Renault
d3b1096510 Compute the word attribute postings lists on each threads 2020-08-06 11:50:27 +02:00
Clément Renault
8d734941af Clean up some lines 2020-08-06 10:20:26 +02:00
Clément Renault
6508d497ce Replace the regex highlighting by a simple algorithm 2020-08-05 13:52:27 +02:00
Clément Renault
4873abe145 Introduce option flags to toggle the indexing engine 2020-08-05 12:10:41 +02:00
Clément Renault
bd4b18541c Introduce a new indexer which uses an MTBL sorter 2020-08-04 15:44:37 +02:00
Kerollmops
ee305c9284 Replace the title by the milli logo 2020-07-15 23:55:28 +02:00
Kerollmops
9ade00e27b Highlight all the matching words 2020-07-14 11:53:21 +02:00
Kerollmops
085c376655 Use the regex crate to highlight "hello" 2020-07-14 11:28:40 +02:00
Kerollmops
aa92311d4e Add a dark theme to the dashboard 2020-07-13 23:51:41 +02:00
Kerollmops
3d144e62c4 Search for best proximities in multiple attributes 2020-07-13 19:06:56 +02:00
Kerollmops
576dd011a1 Compute the candidates but not by attribute 2020-07-13 18:16:05 +02:00
Kerollmops
6b14b20369 Introduce a method to retrieve the number of attributes of the documents 2020-07-13 17:50:16 +02:00
Kerollmops
92c2b1dd2d Refine the help message of the binaries 2020-07-12 11:06:45 +02:00
Kerollmops
f757df5dfd Introduce the stderr logger to the project 2020-07-12 11:04:35 +02:00
Kerollmops
12358476da Use the log crate instead of stderr 2020-07-12 10:55:09 +02:00
Kerollmops
2c62eeea3c Rename the project milli 2020-07-12 00:16:41 +02:00
Kerollmops
d31da26a51 Avoid cloning RoraringBitmaps when unecessary 2020-07-11 23:51:32 +02:00
Kerollmops
b8a1fc0126 Clean up the CSS style custom bulma rules 2020-07-11 14:51:59 +02:00
Kerollmops
f6eae91c7d Pretty print the new dashboard numbers 2020-07-11 14:17:37 +02:00
Kerollmops
d44428fa90 Display more informations on the dashboard 2020-07-11 11:51:56 +02:00
Kerollmops
11c7fef80a Implement a memory dumper
It moves the in memory HashMaps used when indexing to a disk based MTBL file
2020-07-07 16:48:49 +02:00
Kerollmops
b12bfcb03b Reduce the deepness of the word position document ids
This helps reduce the number of allocations.
2020-07-07 12:30:05 +02:00
Kerollmops
7178b6c2c4 First basic version using MTBL again 2020-07-07 11:32:33 +02:00
Kerollmops
adb1038b26 Add a jobs parameter to set the number of threads the indexer uses 2020-07-06 12:17:17 +02:00
Kerollmops
ec1023e790 Intersect document ids by inverse popularity of the words
This reduces the worst request we had which took 56s to now took 3s ("the best of the do").
2020-07-05 19:33:51 +02:00
Kerollmops
cd7e64b2b3 Allow users to set the arc cache size when indexing 2020-07-04 18:12:41 +02:00
Kerollmops
ac8353a64f Merge pre-computed word attribute documents ids 2020-07-04 17:02:27 +02:00
Kerollmops
fea7cac206 Display the time it took to compute the word attribute documents ids 2020-07-04 15:18:38 +02:00
Kerollmops
46ced5c828 Introduce the RwIter append heed API 2020-07-04 12:34:10 +02:00
Kerollmops
7e7440c431 Finalize the LMDB indexing design 2020-07-01 22:45:43 +02:00
Kerollmops
2ae3f40971 Make the indexer ignore certain words
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
2020-07-01 17:49:46 +02:00
Kerollmops
a3ac2623d5 Introduce multiple functions to clean up the code 2020-07-01 17:24:55 +02:00
Kerollmops
ac5cc7ddad Introduce an Iterator yielding owned entries for the LruCache 2020-07-01 17:21:52 +02:00
Kerollmops
014a25697d Use only one ARC cache based on the words 2020-07-01 12:03:18 +02:00
Kerollmops
fc4013a43f Fix the ARC cache 2020-07-01 10:35:07 +02:00
Kerollmops
2fcae719ad Use another LRU impl which uses hashbrown 2020-06-29 22:26:06 +02:00
Kerollmops
f98b615bf3 Replace the LRU by an Arc cache 2020-06-29 20:48:57 +02:00
Kerollmops
07abebfc46 Introduce a (too big) LRU cache 2020-06-29 18:15:03 +02:00
Kerollmops
5f0088594b Index by writing directly into LMDB 2020-06-29 13:54:47 +02:00
Kerollmops
63cbeca64e Skip all derived words when too short 2020-06-28 12:13:12 +02:00
Kerollmops
736f0f7560 Use the proximity instead of the attributes when searching for <= 7 proximities 2020-06-28 12:13:12 +02:00
Kerollmops
fe3be8f18a Replace the HashMap by a Vec for attributes documents ids 2020-06-28 12:13:12 +02:00
Kerollmops
6a2834f2b0 Add a jobs parameter to set the number of threads the indexer uses 2020-06-28 12:13:10 +02:00
Kerollmops
7e16afbdce Ignore documents which are not part of the candidates when exploring with A* 2020-06-24 15:06:45 +02:00
Kerollmops
1c7a9a4132 Remove the found documents from the candidates list 2020-06-24 15:00:26 +02:00
Kerollmops
50169b9798 Compute the full list of ids we are willing to find by attribute 2020-06-24 14:48:04 +02:00