lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: solr getUniqueTermCount() when multiple segments?
Date Tue, 07 Sep 2010 08:45:28 GMT
This is expected/intentional, because computing the "true" unique term
count across multiple segments is exceptionally costly (you have to do
the merge sort to de-dup).

If you really want the true count, you can pull the TermsEnum and
.next() until exhaustion.

Alternatively, you can use IndexReader.getSequentialSubReaders(), then
step through each SegReader calling its .getUniqueTermCount() and then
somehow "approximate" (eg the sum will be an upper bound of the total
unique count).

Mike

On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley <ryantxu@gmail.com> wrote:
> Hello-
>
> I'm looking at using the new terms.getUniqueTermCount() to give a
> quick count for the LukeRequestHandler rather then needing to walk all
> the terms.
>
> When solr index reader has just one segment, it works great.  However
> with more segments I get:
>
> java.lang.UnsupportedOperationException: this reader does not
> implement getUniqueTermCount()
>        at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)
>
> Is this expected?  Is there any way around that?
>
> I am getting the terms using:
>
>          Terms terms = MultiFields.getTerms(reader, fieldName);
>          long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();
>
> Thanks
> ryan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message