lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <ivasi...@sirma.bg>
Subject Re: Faster way for faceting?
Date Tue, 25 Aug 2009 08:31:51 GMT
Hi Simon,

10x for your answer.
Unfortunately the code that you suggest is compatible in speed with the 
code that we use in our app (it was even a bit slower).

10x,
Ivan

Simon Willnauer wrote:
> Hi there,
>
> I'm not sure if the performance is considerable for you but you could try:
>  TermDocs termDocs = this.reader.termDocs(term);
>  int count = 0;
>  while(termDocs.next()){
>    count += termDocs.freq();
>  }
>
>
> simon
>
> On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev<ivasilev@sirma.bg> wrote:
>   
>> Hi All,
>>
>> We use faceting in our app but it is very slow for the indexes that use our
>> clients.
>> First I will say what I understand under faceting - this is for each term
>> for certain field to obtain 1. number of docs that contain it, 2. the total
>> number of occurrences of the term in the index.
>> Now what we use to obtain the information:
>>
>>      ...
>>      some code for obtained terms on which we will make faceting
>>      ...
>>
>>       Term[] retTerms = new Term[terms.size()];
>>       int[] retFreqs = new int[retTerms.length];
>>       int[] retDocs = new int[retTerms.length];
>>       TermPositions tp = mSearcher.getIndexReader().termPositions();
>>       int i = 0;
>>       for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
>>           try {
>>               retTerms[i] = iter.next();
>>               tp.seek(retTerms[i]);
>>               while(tp.next()) {
>>   //                tp.read(new int[]{}, new int[]{});
>> //                    tp.doc();
>>                   retFreqs[i] += tp.freq();
>>                   retDocs[i]++;
>>               }
>>           } finally {
>>               if(tp != null) {
>>                   tp.close();
>>               }
>>           }
>>       }
>>
>> Now what I discovered that is extremely faster for obtaining number of docs
>> that contain each term.
>>
>>       ...
>>      the same code for obtained terms on which we will make faceting
>>      ...
>>
>>       Term[] retTerms = new Term[terms.size()];
>>       int[] retFreqs = new int[retTerms.length];
>>       int i = 0;
>>       long t1 = System.currentTimeMillis();
>>       for (Term currTerm : terms) {
>>           retTerms[i] = currTerm;
>>           retFreqs[i] = mSearcher.docFreq(currTerm);
>>           i++;
>>       }
>>
>> I tested two code versions for obtaining 1 237 390 term facets. The
>> difference in time was 10 times (second version wins). I know that this is
>> because Lucene index keeps for each term the number of docs that contain it.
>>
>> My question - is there some way to obtain the total number of occurrences of
>> the term in the index in some similar fast way?
>>
>> Best Regards,
>> Ivan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message