lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Faster way for faceting?
Date Mon, 24 Aug 2009 16:33:31 GMT
Hi there,

I'm not sure if the performance is considerable for you but you could try:
 TermDocs termDocs = this.reader.termDocs(term);
 int count = 0;
 while(termDocs.next()){
   count += termDocs.freq();
 }


simon

On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev<ivasilev@sirma.bg> wrote:
> Hi All,
>
> We use faceting in our app but it is very slow for the indexes that use our
> clients.
> First I will say what I understand under faceting - this is for each term
> for certain field to obtain 1. number of docs that contain it, 2. the total
> number of occurrences of the term in the index.
> Now what we use to obtain the information:
>
>      ...
>      some code for obtained terms on which we will make faceting
>      ...
>
>       Term[] retTerms = new Term[terms.size()];
>       int[] retFreqs = new int[retTerms.length];
>       int[] retDocs = new int[retTerms.length];
>       TermPositions tp = mSearcher.getIndexReader().termPositions();
>       int i = 0;
>       for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
>           try {
>               retTerms[i] = iter.next();
>               tp.seek(retTerms[i]);
>               while(tp.next()) {
>   //                tp.read(new int[]{}, new int[]{});
> //                    tp.doc();
>                   retFreqs[i] += tp.freq();
>                   retDocs[i]++;
>               }
>           } finally {
>               if(tp != null) {
>                   tp.close();
>               }
>           }
>       }
>
> Now what I discovered that is extremely faster for obtaining number of docs
> that contain each term.
>
>       ...
>      the same code for obtained terms on which we will make faceting
>      ...
>
>       Term[] retTerms = new Term[terms.size()];
>       int[] retFreqs = new int[retTerms.length];
>       int i = 0;
>       long t1 = System.currentTimeMillis();
>       for (Term currTerm : terms) {
>           retTerms[i] = currTerm;
>           retFreqs[i] = mSearcher.docFreq(currTerm);
>           i++;
>       }
>
> I tested two code versions for obtaining 1 237 390 term facets. The
> difference in time was 10 times (second version wins). I know that this is
> because Lucene index keeps for each term the number of docs that contain it.
>
> My question - is there some way to obtain the total number of occurrences of
> the term in the index in some similar fast way?
>
> Best Regards,
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message