lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: efficiently finding all terms used on a particular field within Documents matching a query
Date Thu, 10 Nov 2005 07:12:10 GMT

: For example I would like to find the set of terms used within a particular
: date range, where all Documents have a date field on them. What I've done
: to date is simply perform a query to find all Documents that match the
: date range query, then iterate over each one and construct a Set of all
: terms used in the particular field I'm interested in.
:
: I'm wondering if there a more efficient way to accomplish this?

I believe there is -- provided the terms are index.

1) Get yourself a BitSet representing the Documents you are interested in
(you mentioned having a a date range, you can either use a RangeFilter nad
call the bits method directly, or you can do a search using a
HitCollector)

2) Look at the code that acctually makes RangeFilter work.  It iterates
over a TermEnum between a low and high value.  for each term it finds, it
uses a TermDocs to record the docid.  You could do something very similar,
looping over all terms in the field you want.  but instead of recording
the docid, add the term to your Set object -- if and only if one of the
docids from the TermDocs is in your BitSet from step #1 above.


...that should be faster then the appraoch you have now.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message