lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khawaja Shams" <>
Subject Re: distinct field values
Date Wed, 15 Oct 2008 04:07:46 GMT
Hi,  You may also want to take a look at Carrot2:

Lucene documentation references them, but I was disappointed to see that
they had an open source version (really old) and one that you can buy. It
may work for you.

Also, take a look at SOLR's implementation of faceted searching:

I have been dealing with a very similar problem.  Iterating over all the
hits may not be too slow if you do it right. One such brute-force way to
deal with this is to use the FieldCache for the tag field to quickly iterate
over all the document ids that came back with the search.  The constant
lookup time in an array makes it nice and easy, but if may not scale well
with the number of documents. If you know that the number of distinct tags
in your dataset is fairly small (5-10), you can always just run the query
with each tag added as a constraint to find out if there are any matches or
you can manage a filter for each value to deduce if it exists in the
resultset even faster.

 Please share your experience as you find more clues on this problem.

Khawaja Shams

On Tue, Oct 14, 2008 at 8:15 PM, Chris Hostetter

> : For example if I have 100 documents in my index and my set of tag = {A,
> B, C}.
> : Query Q on the text field returns 15 docs with tag A , 10 with tag B and
> none
> : with tag C (total of 25 hits). Is there a way to determine that the set
> of
> : tags for query Q = {A, B} without iterating through all 25 hits.
> what you are describing is is a subset of a rbaoder topic known as
> "faceted searching" ... if you search the archives for that or "category
> counts" you'll find quite a bit of discussion on the approaches that can
> be used to ahieve this.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message