lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to get the number of unique terms in the inverted index
Date Thu, 27 May 2010 22:58:27 GMT
I suspect it's not supported because it hasn't been seen
as valuable enough to put the effort into. You simply asked
if it was supported without any use-case, and I'm having a
hard time coming up with one on my own.....

If it's important to your particular situation, you could
have a special document in your index that contained these
values, with fields orthogonal to all fields in all other docs and
compute this at index time. Lookup would then be easy....

Best
Erick

On Thu, May 27, 2010 at 5:01 PM, kannan chandrasekaran
<ckannanck@yahoo.com>wrote:

> Hi Yonik,
>
> Thanks for the quick response. I am curious as to why this is not supported
> whereas the numdocs() is supported ? Even in the upcoming version its only
> supported per segment and not across the index,  why ? Is it difficult to
> implement efficiently ?
>
> Pardon my ignorance if I am missing something thats very obvious...
>
> Thanks
> Kannan
>
> On Thu, May 27, 2010 at 2:32 PM, kannan chandrasekaran
> <ckannanck@yahoo.com> wrote:
> > I was wondering �if there is a way to retrieve the number of unique terms
> in the lucene
> ( version 2.4.0) ... I am aware of the terms() && terms(Term) method that
> returns
> an enumeration (TermEnum) but that involves iterating through the terms and
> couting them.
> �I looking for something similar to numdocs() in the IndexReader class.
>
> No there is not.
> In 4.0-dev, with the new "flex" APIs, you can retrieve the number of
> unique terms in a single segment (Terms.getUniqueTermCount()), but not
> a whole index.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message