lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject Re: docid set compression and boolean docid set operations
Date Mon, 15 Sep 2008 20:24:08 GMT
>    Are you guys interested in helping out on kamikaze?

sure, as much as my schedule permits (slow but steady :).
p4delta should be the right way to go. My proposal would be to try to make as fast as it goes
(Paul's comment about if in decompression loop...) for the simplest case, set of integers
(longs?) and make it available as one of Filter implementations (fast track commit into lucene
and bigger exposure to others that can make it better). 
The code in kamikaze is probably almost there (not looked into it yet), we just need to make
an issue in JIRA  and provide simple patch for basic p4delta functionality and one Filter
implementation with a few test cases. So it becomes usefull for the comunity from the very
start (also possibility to mix it with Pauls code on DocIdSetIterators...). 
This will not bring huge benefit but is definitly usfull option for Filters. The real work
then is to make use of it for on disk format and replace partially VInt encoding for more
involved cases I meantoned before like (docId delta, term frequency) pairs in lucene with
multilevel skipping information.  

As Paul said, small steps :)   

View raw message