lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Vasilev <ivasi...@sirma.bg>
Subject Faster way for faceting?
Date Mon, 24 Aug 2009 16:14:05 GMT
Hi All,

We use faceting in our app but it is very slow for the indexes that use 
our clients.
First I will say what I understand under faceting - this is for each 
term for certain field to obtain 1. number of docs that contain it, 2. 
the total number of occurrences of the term in the index.
Now what we use to obtain the information:

       ...
       some code for obtained terms on which we will make faceting
       ...

        Term[] retTerms = new Term[terms.size()];
        int[] retFreqs = new int[retTerms.length];
        int[] retDocs = new int[retTerms.length];
        TermPositions tp = mSearcher.getIndexReader().termPositions();
        int i = 0;
        for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
            try {
                retTerms[i] = iter.next();
                tp.seek(retTerms[i]);
                while(tp.next()) {
    //                tp.read(new int[]{}, new int[]{});
//                    tp.doc();
                    retFreqs[i] += tp.freq();
                    retDocs[i]++;
                }
            } finally {
                if(tp != null) {
                    tp.close();
                }
            }
        }

Now what I discovered that is extremely faster for obtaining number of 
docs that contain each term.

        ...
       the same code for obtained terms on which we will make faceting
       ...

        Term[] retTerms = new Term[terms.size()];
        int[] retFreqs = new int[retTerms.length];
        int i = 0;
        long t1 = System.currentTimeMillis();
        for (Term currTerm : terms) {
            retTerms[i] = currTerm;
            retFreqs[i] = mSearcher.docFreq(currTerm);
            i++;
        }

I tested two code versions for obtaining 1 237 390 term facets. The 
difference in time was 10 times (second version wins). I know that this 
is because Lucene index keeps for each term the number of docs that 
contain it.

My question - is there some way to obtain the total number of 
occurrences of the term in the index in some similar fast way?

Best Regards,
Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message