lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Using Lucene partly as DB and 'joining' search results.
Date Sat, 12 Apr 2008 08:51:54 GMT
Op Saturday 12 April 2008 00:03:13 schreef Antony Bowesman:
> Paul Elschot wrote:
> > Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme:
> >> Use Filter and BitSet.
> >>  From the personnal data, you build a Filter
> >> (
> >>Fil ter.html) wich is used in the main index.
> >
> > With 1 billion mails, and possibly a Filter per user, you may want
> > to use more compact filters than BitSets, which is currently
> > possible in the development trunk of lucene.
> Thanks for the pointers.  I've already used Solr's DocSet interface
> in my implementation, which I think is where the ideas for the
> current Lucene enhancements came from.

The ideas came from quite a few sources. They can be traced
starting from changes.txt in the sources.

> They work well to reduce the 
> filter's footprint.  I'm also caching filters.
> The intention is that there is a user data index and the mail
> index(es).  The search against user data index will return a set of
> mail Ids, which is the common key between the two. Doc Ids are no 
> good between the indexes, so that means a potentially large boolean
> OR query to create the filter of labelled mails in the mail indexes. 
> I know it's a theoretical question, but will this perform?

The normal way to collect doc ids for a filter is into a bitset
iterating over the indexed ids (mail ids in your case). A bitset
has random access, so there is no need to do this in doc id order.
An OR query has to work in doc id order so it can compute a score
per doc id, and the ordering loses some performance.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message