lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: Count unique occurrences in hits
Date Thu, 09 Dec 2004 13:51:30 GMT


Daniel Herlitz schrieb:
> I have done a Lucene search and want to know the number of unique terms 
> for a field (indexed) in the result. For example I have searched for a 
> book title and want to know the number of authors (single terms) 
> represented in the result.
> 
> Is there any way of doing this without having to scan through all the hits?

If the field you want to access (in your example the author field) is
indexed (not stored, no termvectors) then the only way to achieve what you
want is to get all terms of the author field with a TermEnum and compare the
ids of the documents they occur in (TermDocs) with the document ids of the
hit list.

You could make the author field stored. I you do so, the author field is one
of the fields you get when you say hits.doc(int). If you only want to access
the author field this might be a little bit inefficient. You could use
termvectors for the author field. With termvectors you can selectvely
retrieve terms of a field from a document.

Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message