lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: solr getUniqueTermCount() when multiple segments?
Date Tue, 07 Sep 2010 09:48:41 GMT
Ahh -- this makes sense.  I thought it was too good to be true!


On Tue, Sep 7, 2010 at 4:45 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> This is expected/intentional, because computing the "true" unique term
> count across multiple segments is exceptionally costly (you have to do
> the merge sort to de-dup).
>
> If you really want the true count, you can pull the TermsEnum and
> .next() until exhaustion.
>
> Alternatively, you can use IndexReader.getSequentialSubReaders(), then
> step through each SegReader calling its .getUniqueTermCount() and then
> somehow "approximate" (eg the sum will be an upper bound of the total
> unique count).
>
> Mike
>
> On Tue, Sep 7, 2010 at 2:34 AM, Ryan McKinley <ryantxu@gmail.com> wrote:
>> Hello-
>>
>> I'm looking at using the new terms.getUniqueTermCount() to give a
>> quick count for the LukeRequestHandler rather then needing to walk all
>> the terms.
>>
>> When solr index reader has just one segment, it works great.  However
>> with more segments I get:
>>
>> java.lang.UnsupportedOperationException: this reader does not
>> implement getUniqueTermCount()
>>        at org.apache.lucene.index.Terms.getUniqueTermCount(Terms.java:84)
>>
>> Is this expected?  Is there any way around that?
>>
>> I am getting the terms using:
>>
>>          Terms terms = MultiFields.getTerms(reader, fieldName);
>>          long cnt = (terms==null) ? 0 : terms.getUniqueTermCount();
>>
>> Thanks
>> ryan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message