lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: docid set compression and boolean docid set operations
Date Thu, 11 Sep 2008 10:28:58 GMT

I've taken a first look at the code, and I have a few questions.

Did I understand correctly that it is basically a two way
conversion between an integer array and an (Open)BitSet
representing a p4delta data structure?

In that case it would still be necessary to extend the lucene
index structure to make it understand the p4delta data structure
at the appropriate places.
I can help getting the code integrated into lucene, but I've
never done an index structure extension, so I'd like to have
some support from this list for that.
The code would initially need some package restructuring and
layout changes, and then it could move forward to an index
structure extension.

Would you have some ideas on how to use de p4delta structure
to store docIds, term frequencies and term positions?
The references give some insights there, but it seems that there
is still quite a bit of work to do get such "details" right.
Fortunately, the existing Lucene TermDocs and TermPositions
appear to be just right for this.

Paul Elschot

Op Wednesday 10 September 2008 23:09:18 schreef John Wang:
> Sorry, I meant lucene 2.4
> -John
> On Wed, Sep 10, 2008 at 2:08 PM, John Wang <> 
> > Hi guys:
> >
> >      We have build this on top of the lucene 1.4. api/refactoring
> > for docid sets and docIdIterater.
> >
> >      We've implemented the p4Delta compression algorithm presented
> > at www2008:
> >
> >      We've been using this in production here at LinkedIn and would
> > love to contribute it into lucene.
> >
> >      We currently open sourced it at:
> >
> >
> >      Please let us know if it is thing you guys want to proceed, if
> > so, what are the steps we should take.
> >
> > Thanks
> >
> > -John

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message