lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: API that return the amount of terms indexed
Date Sat, 16 Oct 2010 09:52:01 GMT
Hi Mike,

As far as I know, 3.0 also has this method:
http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexRe
ader.html#getUniqueTermCount()

But it also only works on segment level, too! So you have to use
getSequentialSubReaders/ReaderUtil.gatherSubReaders() and do it per segment.
But to get the unique count for the whole index, there is no way around
iterating every term, as duplicates must be removed (which TermEnum does).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, October 16, 2010 11:17 AM
> To: java-user@lucene.apache.org
> Subject: Re: API that return the amount of terms indexed
> 
> 4.0 will have an API to get the number of unique terms for a given field,
or
> across all fields, but only at the segment level.  (Getting the count
across
> segments requires a merge sort).
> 
> 3.x and before doesn't have such an API, though the information is tracked
> under the hood.  If you open the _X.tis file, skip the first int, then
call
> readLong(), that should be the number of unique terms in that segment.
> 
> You can always simply fallback to getting the term enum and stepping
counting
> how many .next()'s there are until exhaustion...
> 
> Mike
> 
> On Fri, Oct 15, 2010 at 7:51 PM, APOLO_11 <barhen.dan@gmail.com> wrote:
> >
> > hey - is there an API that return the number of term indexed?
> >
> > I found  the API return the amount of document indexed
> > (IndexWriter.docCount) but cant find an API for the amount of terms in
> > the index.
> >
> > any idea ?
> >
> > thanks,d.
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/API-that-return-the-amount-of-terms
> > -indexed-tp1712290p1712290.html Sent from the Lucene - Java Users
> > mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message