lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: efficiently finding all terms used on a particular field within Documents matching a query
Date Thu, 10 Nov 2005 08:01:33 GMT
On Thursday 10 November 2005 08:12, Chris Hostetter wrote:
> : For example I would like to find the set of terms used within a particular
> : date range, where all Documents have a date field on them. What I've done
> : to date is simply perform a query to find all Documents that match the
> : date range query, then iterate over each one and construct a Set of all
> : terms used in the particular field I'm interested in.
> :
> : I'm wondering if there a more efficient way to accomplish this?
> I believe there is -- provided the terms are index.
> 1) Get yourself a BitSet representing the Documents you are interested in
> (you mentioned having a a date range, you can either use a RangeFilter nad
> call the bits method directly, or you can do a search using a
> HitCollector)
> 2) Look at the code that acctually makes RangeFilter work.  It iterates
> over a TermEnum between a low and high value.  for each term it finds, it
> uses a TermDocs to record the docid.  You could do something very similar,
> looping over all terms in the field you want.  but instead of recording
> the docid, add the term to your Set object -- if and only if one of the
> docids from the TermDocs is in your BitSet from step #1 above.

Once the doc id's are available, an alternative to get the terms used
in a particular field is to get them from the TermVectors of the docs.
The field of interest must be indexed with TermVectors for this to work.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message