lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Seltzer" <>
Subject RE: Faceting, Sort and DocIDSet
Date Mon, 20 Apr 2009 14:12:21 GMT
Hi Karsten,

My index contains about 100M documents, and I'm trying to count results
on around 300 facets. At the moment I'm keeping a set of cached facet
bitsets and then comparing the query result against those bitsets.
Performance is pretty lousy. It takes more than 2s to calculate the
cardinality of the main query against those 300 facets.

I have two possible datasets to use for the facets. One is an integer
and the other is a short string (about 10 characters).

The taxonomy solution seems interesting but it might be overkill since
there is really no hierarchical relationship between these facets.

I could count the facets manually by implementing a hitcollector, but
the javadocs warn (pretty strenuously) about reading the content of a
document inside a hitcollector. Is this something I should be worried
about, or is it an inevitable part of the solution?



-----Original Message-----
From: Karsten F. [] 
Sent: Saturday, April 18, 2009 10:58 AM
Subject: Re: Faceting, Sort and DocIDSet

Hi Dave,

searching and sorting in lucene are two separate functions (if you not
to sort by relevance).
You will not loss performance if you first search with BitSet as
HitCollector and then sort the result by DateField.
But more easy is to extend TopFieldDocCollector/TopFieldCollector to a
Collector with facet count.

Sujit Pal's implementation of facet count is a good idea if you have a
amount of facets and a lot documents for each facet.

I know half a dozen of implementations of facet browsing.
To choose the best you have to know:
 - How many different values have the facet? Which kind of value
small String, huge String)?
 - More then one value of the facet per document/how many in average?
is also interesting for you.

Best regards

David Seltzer wrote:
> I have a set of indexes, each index contains a month's worth of
> Articles. I need to be able to search the index (sorting by date) and
> then apply access-filters based on the Article Source. I'm also trying
> to get result counts for each Article Source.
> So my questions: 
> 1) How do I use a HitCollector and sort by a field? 
> 2) Is using BitSets the wrong way to quickly generate facet counts?
> read about DocIDSets, but I'm not sure how to use them in the same
> (I'm basing my faceting technique on Sujit Pal's article
> ml)
> Thanks!
> -Dave

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message