lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Determining index term count
Date Wed, 07 Jan 2009 20:43:20 GMT
Greg Shackles wrote:
> I'm not sure offhand how to write the code to do it, but I know when you
> open an index in Luke, that is one of the numbers it gives you.  If you want
> to just get the number once that would be an easy way to do it.  If you want
> the code for it, Luke is open source so you could see how they do it.  (I
> used Luke as a starting point at one point for seeing how to get a list of
> high frequency terms).

Luke currently uses the same method as you used, i.e. creates a TermEnum 
and traverses all terms. This is fast enough and doesn't require access 
to implementation details.

There is a faster way to do it, but it's not exposed through API. 
SegmentReader (a concrete impl. of IndexReader) opens a TermInfosReader, 
which has a field SegmentTermEnum:indexEnum, which in turn has a field 
"size", and this is the number of terms. Accessing this information this 
way would be messy - it's better to propose that this information should 
be added to API.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message