lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to get the number of unique terms in the inverted index
Date Thu, 27 May 2010 20:34:33 GMT
Also in 2.9.2 and 3.0.1:
http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexRea
der.html#getUniqueTermCount()
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/IndexRea
der.html#getUniqueTermCount()

Please note, this works only with SegmentReaders, so you have to first get
the getSequentialSubReaders() and you *may* sum up the number on them. But
this would not give the correct number, as segments may have (or in most
cases they have lots of) overlapping terms. For an optimized index
getSequentialSubReaders() returns one index and its unique term count is
correct.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Thursday, May 27, 2010 9:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to get the number of unique terms in the inverted index
> 
> On Thu, May 27, 2010 at 2:32 PM, kannan chandrasekaran
> <ckannanck@yahoo.com> wrote:
> > I was wondering  if there is a way to retrieve the number of unique
terms
> in the lucene ( version 2.4.0) ... I am aware of the terms() &&
terms(Term)
> method that returns an enumeration (TermEnum) but that involves iterating
> through the terms and couting them.  I looking for something similar to
> numdocs() in the IndexReader class.
> 
> No there is not.
> In 4.0-dev, with the new "flex" APIs, you can retrieve the number of
unique
> terms in a single segment (Terms.getUniqueTermCount()), but not a whole
> index.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message