lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject RE: caching term information?
Date Tue, 23 May 2006 05:09:08 GMT
The WorkspaceInfo class in unneccessary. The WorkspaceDetails can be
persisted directly if reworked.

-----Original Message-----
From: Doug Cutting []
Sent: Monday, May 22, 2006 5:48 PM
Subject: Re: caching term information?

Robert Engels wrote:
> I was amazed at how much time is spent in both readVint and readByte().
> Seems high, but I think it is mainly due to the number of invocations.

Profilers have been known to exaggerate this sort of thing.  These are
central routines of Lucene, but they're also pretty simple and hard to
make a lot faster.

> 1) What if BufferedIndexInput had an optimized version of readVint that
> the buffer and manipulated the position directly?

Give it a try and see if it's much faster.  Sun's JVMs are pretty smart
these days, and such micro-optimizations are proving less likely to
improve things than they used to be.  Also, we don't want to tune things
too highly for any given JVM, so it would have to be substantially
faster to warrant committing something like this.

> 2) Instead of caching the TermInfo, what if the TermDocs were cached
> for the top 20% terms). The memory requirement would be much greater, but
> you could also say "do not cache the TermDocs that had more than X
> documents". The optimized searcher already converts TermQueries similar to
> this to a Filter anyway.

The majority of query time is typically spent processing terms that
occur in lots of documents.  Terms that occur in only few documents are
faster to process, so speeding them doesn't affect overall performance
as much as one might hope.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message