lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: cache check?
Date Sun, 21 Nov 2004 12:20:34 GMT
On Nov 20, 2004, at 5:49 PM, Vic wrote:
> Erik Hatcher wrote:
>> The Hits object already does some most-recently-used caching.
> Is there any docs on this or should I look in source?

The caching is there to avoid disk access of the Lucene index for the 
documents most likely to be accessed next.

> I plan on terabytes search

That's quite a lot of data.  You'll have to do more than just use plain 
Lucene to handle this much data, of course.

> I have no idea how fast Lucene will be untill I am done and loaded and 
> have querries coming in, but I know I will need to manage the cache.

My advice would be to not worry about caching unless and until you need 
it.  You're searching terrabytes, you say, but that does not mean you 
are accessing every single document that comes back from searches.  One 
big issue is how you access the documents you get back from hits - 
accessing a document is when Lucene goes to the index and retrieves 
(currently) the entire document including all the stored fields.  
Minimizing the documents you access in this way (say displaying 10 or 
20 at a time, which is typical) is wise.

I really don't see a need for any custom caching on top of Lucene.  
Remember the rule of optimization: don't.  And for experts only: don't 
do it yet.  :)

> It depends on how good and tuneable is "some LRU caching" in Hits. Is 
> it  soft? Can it take up 2 gigs of ram?

Hits is not tunable.  It caches up to 200 documents.  Though you can 
use Lucene's lower-level search() API methods to do some of your own 
magic if you like - look to see how Hits does its thing with the basic 
search(Query) method.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message