lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: API that return the amount of terms indexed
Date Sat, 16 Oct 2010 09:16:36 GMT
4.0 will have an API to get the number of unique terms for a given
field, or across all fields, but only at the segment level.  (Getting
the count across segments requires a merge sort).

3.x and before doesn't have such an API, though the information is
tracked under the hood.  If you open the _X.tis file, skip the first
int, then call readLong(), that should be the number of unique terms
in that segment.

You can always simply fallback to getting the term enum and stepping
counting how many .next()'s there are until exhaustion...

Mike

On Fri, Oct 15, 2010 at 7:51 PM, APOLO_11 <barhen.dan@gmail.com> wrote:
>
> hey - is there an API that return the number of term indexed?
>
> I found  the API return the amount of document indexed
> (IndexWriter.docCount) but cant find an API
> for the amount of terms in the index.
>
> any idea ?
>
> thanks,d.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/API-that-return-the-amount-of-terms-indexed-tp1712290p1712290.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message