lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: How to get all field values from a Hits object?
Date Tue, 18 Jan 2005 03:55:59 GMT

: is it possible to get all different values for a
: <Field> from a <Hits> object and how to do this?

The ording of your question suggests that the Field you are interested in
isn't a field which will have a fairly unique value for every doc (ie: not
a "title", more likely an "author" or "category" field).  Starting with
that assumption, then there is fairly efficient way to get the information
you want...

Assuming the total set of values for the Field you are interested in is
small (relative your index size), you can pre-compute a BitSet for
each value indicating which docs match that value in the Field (using a
TermFilter).  Then store those BitSets in a Map (key'ed by field value)

Everytime a search is performed, use a HitCollector that generates a
BitSet containing the documents in your result; AND that BitSet against (a
copy of) each BitSet in your Map.  All of the resulting BitSets with a
non-zero cardinality represent values in your results.  (As an added bonus
the cardinality() of each BitSet is the total number of docs in your
result that contain that value)

Two caveats:
   1) Everytime you modify your index, you have to regen the
      BitSets in your Map.
   2) You have to know the set of all values for the field you are
      interested in.  In many cases, this is easy to determine from the
      source data while building the index.  but it's also possible to
      get it using IndexReader.termDocs(Term).


(I'm doing something like this to provide ancilary information about which
categories of documents are most common in the users search result, and
what the exact number of documents in those categories is)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message