lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tierecke <nir.nussb...@gmail.com>
Subject docFreq takes long time to execute in a multiple index environment
Date Sun, 05 Aug 2007 23:40:37 GMT

Hi there,

I have my 25 indexes of 1.8GB each read with MultiReader.
I try to get the document frequency of all the terms in specific documents
and it takes quite a long time - a document with 1000 terms takes around
4:30 minutes to calculate all the document frequencies of its terms - and
there are longer documents than that.

Since I have quite a lot of documents to process (around 12000) - it'll take
forever.
My function of getting the document frequency is listed below (it's for one
single term - but it's called for all the terms in the document term vector.

    public int getdocumentfrequency (String termstr) throws Exception
    {
        Term term=new Term("contents", termstr);
        TermEnum termenum=multireader.terms(term);
        int freq=termenum.docFreq();
        return freq;
    }

Is there a better (i.e. faster) way to get all the document frequencies of a
specific document?

thanks a lot,
Nir.

-- 
View this message in context: http://www.nabble.com/docFreq-takes-long-time-to-execute-in-a-multiple-index-environment-tf4221604.html#a12009334
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message