Commit Graph

167 Commits

Author SHA1 Message Date
802e925fd7 Switch to a JSON protocol for the front page 2020-10-21 18:26:29 +02:00
2210818114 Introduce the obkv heed codec 2020-10-21 15:51:48 +02:00
f948a03be2 Optimise the merge functions to avoid allocations 2020-10-20 16:40:50 +02:00
cde8478388 Replace the panic in the merge function by actual errors 2020-10-20 16:19:07 +02:00
35c9a3c558 Brodacast the updates infos to every ws clients 2020-10-20 11:19:34 +02:00
871222aebd Introduce some new routes to handle live indexing 2020-10-19 16:06:43 +02:00
65e32fecb1 Move the binaries into one with subcommands 2020-10-19 13:44:17 +02:00
eca49e3a03 Introduce a notification channel for the UpdateStore 2020-10-18 16:37:37 +02:00
83c1db8763 Introduce the UpdateStore 2020-10-18 15:26:57 +02:00
9021b2dba6 Introduce the enable-chunk-fusing flag 2020-10-14 18:44:59 +02:00
f980422c57 Move from oxidized-mtbl to grenad 2020-10-14 12:47:32 +02:00
4e9bd1fef5 Bump oxidized-mtbl 2020-10-07 14:23:22 +02:00
433d9bbc6e Use CompressionType::from_str rather than a custom function 2020-10-06 13:50:34 +02:00
4b819457c9 Enable the strucopt/clap warp help feature 2020-10-06 13:06:22 +02:00
770f29fd05 Bump the oxidized-mtbl dependency 2020-10-04 17:04:33 +02:00
acd2a63879 Introduce a simple FST based chinese word segmenter 2020-10-04 17:04:33 +02:00
68f4af7d2e Improve the display of the number of processed documents 2020-09-29 16:08:58 +02:00
ed05999f63 Replace the arc cache by a simple linked hash map 2020-09-23 14:50:52 +02:00
d6fa9c0414 Index the intra documents word pair proximities 2020-09-22 14:04:33 +02:00
3ded98e5fa Bump the roaring version that fix a deserialization bug 2020-09-10 22:37:51 +02:00
d5e5baa20f Bump the oxidized-mtbl dependency 2020-09-10 13:29:12 +02:00
0fb086f241 Use the crates.io raoring library 2020-09-08 15:16:04 +02:00
bb1ab428db Use another function to define the proximity 2020-09-06 17:55:07 +02:00
f928b91e9d Specify the exact rev for the near-proximity dep 2020-09-06 17:21:38 +02:00
1c504471d3 Introduce the plane-sweep algorithm 2020-09-05 18:25:27 +02:00
dc88a86259 Store the word positions under the documents 2020-09-05 18:03:06 +02:00
580ed1119a Make the engine to return csv string records as documents and headers 2020-08-31 19:02:00 +02:00
bad0663138 Come back to the old tokenizer 2020-08-31 13:34:38 +02:00
3fe497e129 Improve the Mtbl heed codec to only encode MTBL databases 2020-08-29 11:20:39 +02:00
d19f394630 Make the indexer support gzipped CSV as input 2020-08-21 18:10:24 +02:00
ff479c865d Replace pipe by ringtail to improve stdin read performances 2020-08-21 17:45:52 +02:00
8806fcd545 Introduce a better query and document lexer 2020-08-16 14:36:54 +02:00
1e358e3ae8 Introduce the AstarBagIter that iterates through best paths 2020-08-15 16:24:06 +02:00
d5a356902a Update oxidized-mtbl 2020-08-07 12:14:03 +02:00
405a71d3a4 Accept csv from stdin 2020-08-06 13:38:21 +02:00
6508d497ce Replace the regex highlighting by a simple algorithm 2020-08-05 13:52:27 +02:00
bd4b18541c Introduce a new indexer which uses an MTBL sorter 2020-08-04 15:44:37 +02:00
085c376655 Use the regex crate to highlight "hello" 2020-07-14 11:28:40 +02:00
12358476da Use the log crate instead of stderr 2020-07-12 10:55:09 +02:00
2c62eeea3c Rename the project milli 2020-07-12 00:16:41 +02:00
f6eae91c7d Pretty print the new dashboard numbers 2020-07-11 14:17:37 +02:00
11c7fef80a Implement a memory dumper
It moves the in memory HashMaps used when indexing to a disk based MTBL file
2020-07-07 16:48:49 +02:00
7178b6c2c4 First basic version using MTBL again 2020-07-07 11:32:33 +02:00
2a3b03138b Use heed 0.8.1 with the RwIter append method 2020-07-05 19:50:28 +02:00
46ced5c828 Introduce the RwIter append heed API 2020-07-04 12:34:10 +02:00
2ae3f40971 Make the indexer ignore certain words
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
2020-07-01 17:49:46 +02:00
f98b615bf3 Replace the LRU by an Arc cache 2020-06-29 20:48:57 +02:00
07abebfc46 Introduce a (too big) LRU cache 2020-06-29 18:15:03 +02:00
5f0088594b Index by writing directly into LMDB 2020-06-29 13:54:47 +02:00
1628a31efa Cache the unions of the derived words positions 2020-06-20 15:38:10 +02:00