lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tierecke <>
Subject docFreq takes long time to execute in a multiple index environment
Date Sun, 05 Aug 2007 23:40:37 GMT

Hi there,

I have my 25 indexes of 1.8GB each read with MultiReader.
I try to get the document frequency of all the terms in specific documents
and it takes quite a long time - a document with 1000 terms takes around
4:30 minutes to calculate all the document frequencies of its terms - and
there are longer documents than that.

Since I have quite a lot of documents to process (around 12000) - it'll take
My function of getting the document frequency is listed below (it's for one
single term - but it's called for all the terms in the document term vector.

    public int getdocumentfrequency (String termstr) throws Exception
        Term term=new Term("contents", termstr);
        TermEnum termenum=multireader.terms(term);
        int freq=termenum.docFreq();
        return freq;

Is there a better (i.e. faster) way to get all the document frequencies of a
specific document?

thanks a lot,

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message