lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Searching by bit masks
Date Fri, 10 Nov 2006 17:26:52 GMT
Erick Erickson wrote:
> Something like
> Document doc = new Document();
> doc.add("flag1", "Y");
> doc.add("flag2", "Y");
> IndexWriter.add(doc);

Fields have overheads.  It would be more efficient to implement this as 
a single field with a different value for each boolean flag (as others 
have suggested).

> Another approach: create a set of Lucene Filters (really, these are just
> Java bitsets), one for each flag. All this is a bitset with one bit for 
> each
> document, or about 1M of memory per flag with 8M docs. So you'd populate
> flag1Filter, flag2Filter... and have these ready whenever you needed them.

Cached filters will be faster especially when a large portion of the 
documents have the flag set.  If, for example, you have a flag that is 
set in half the documents that is specified in half the queries, then a 
cached filter will have a large impact on not only the performance of 
those queries but on the performance of your service as a whole.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message