lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: API that return the amount of terms indexed
Date Sat, 16 Oct 2010 10:16:23 GMT
Ahh!  You are right, we did expose this before 4.0.

But yes it has the same requirement -- it only works on a SegmentReader.

Mike

On Sat, Oct 16, 2010 at 5:52 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi Mike,
>
> As far as I know, 3.0 also has this method:
> http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexRe
> ader.html#getUniqueTermCount()
>
> But it also only works on segment level, too! So you have to use
> getSequentialSubReaders/ReaderUtil.gatherSubReaders() and do it per segment.
> But to get the unique count for the whole index, there is no way around
> iterating every term, as duplicates must be removed (which TermEnum does).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Saturday, October 16, 2010 11:17 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: API that return the amount of terms indexed
>>
>> 4.0 will have an API to get the number of unique terms for a given field,
> or
>> across all fields, but only at the segment level.  (Getting the count
> across
>> segments requires a merge sort).
>>
>> 3.x and before doesn't have such an API, though the information is tracked
>> under the hood.  If you open the _X.tis file, skip the first int, then
> call
>> readLong(), that should be the number of unique terms in that segment.
>>
>> You can always simply fallback to getting the term enum and stepping
> counting
>> how many .next()'s there are until exhaustion...
>>
>> Mike
>>
>> On Fri, Oct 15, 2010 at 7:51 PM, APOLO_11 <barhen.dan@gmail.com> wrote:
>> >
>> > hey - is there an API that return the number of term indexed?
>> >
>> > I found  the API return the amount of document indexed
>> > (IndexWriter.docCount) but cant find an API for the amount of terms in
>> > the index.
>> >
>> > any idea ?
>> >
>> > thanks,d.
>> > --
>> > View this message in context:
>> > http://lucene.472066.n3.nabble.com/API-that-return-the-amount-of-terms
>> > -indexed-tp1712290p1712290.html Sent from the Lucene - Java Users
>> > mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message