lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: You are right but it doesn't make it faster.
Date Mon, 06 Aug 2007 16:02:18 GMT

You can speed this up (maybe a lot) by moving the disk head(s)
as little as possible.

Have a look at the file formats of Lucene to get the idea.

In your outer loop iterate over the readers of the multireader.
For each reader iterate over the terms in sorted order.
And don't access the index in any other way while doing this,
that is, do no query searches and no updates.

A bit of bookkeeping per term it will make it straightforward
to compute the total document frequencies.

Paul Elschot

On Monday 06 August 2007 13:12, tierecke wrote:
> Thanks Daniel, you are completely right.
> I changed the code - but it doesn't make it [noticeably faster] - probably
> behind the scene it does run on the enum.
> I added some kind of hash table that keeps the docfreq already read so if I
> meet it again in another document I can retrieve it quickly - is there
> another solution? Maybe have a separate Lucene index for this? (In this case
> - can I read and write to the same index without closing it and reopening
> it? I want to read from it and if I don't find the docfreq there, calculate
> it and put it in the index).
> 10x Nir.
> On Monday 06 August 2007 01:40, tierecke wrote:
> >         Term term=new Term("contents", termstr);
> >         TermEnum termenum=multireader.terms(term);
> >         int freq=termenum.docFreq();
> IndexReader has a docFreq() method, no need to get a Term enumeration.
> regards
>  Daniel
> -- 
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message