lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: caching term information?
Date Mon, 22 May 2006 22:48:08 GMT
Robert Engels wrote:
> I was amazed at how much time is spent in both readVint and readByte().
> Seems high, but I think it is mainly due to the number of invocations.

Profilers have been known to exaggerate this sort of thing.  These are 
central routines of Lucene, but they're also pretty simple and hard to 
make a lot faster.

> 1) What if BufferedIndexInput had an optimized version of readVint that used
> the buffer and manipulated the position directly?

Give it a try and see if it's much faster.  Sun's JVMs are pretty smart 
these days, and such micro-optimizations are proving less likely to 
improve things than they used to be.  Also, we don't want to tune things 
too highly for any given JVM, so it would have to be substantially 
faster to warrant committing something like this.

> 2) Instead of caching the TermInfo, what if the TermDocs were cached (again
> for the top 20% terms). The memory requirement would be much greater, but
> you could also say "do not cache the TermDocs that had more than X
> documents". The optimized searcher already converts TermQueries similar to
> this to a Filter anyway.

The majority of query time is typically spent processing terms that 
occur in lots of documents.  Terms that occur in only few documents are 
faster to process, so speeding them doesn't affect overall performance 
as much as one might hope.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message