On Nov 20, 2004, at 5:49 PM, Vic wrote:
> Erik Hatcher wrote:
>> The Hits object already does some most-recently-used caching.
>
> Is there any docs on this or should I look in source?
The caching is there to avoid disk access of the Lucene index for the
documents most likely to be accessed next.
> I plan on terabytes search
That's quite a lot of data. You'll have to do more than just use plain
Lucene to handle this much data, of course.
> I have no idea how fast Lucene will be untill I am done and loaded and
> have querries coming in, but I know I will need to manage the cache.
My advice would be to not worry about caching unless and until you need
it. You're searching terrabytes, you say, but that does not mean you
are accessing every single document that comes back from searches. One
big issue is how you access the documents you get back from hits -
accessing a document is when Lucene goes to the index and retrieves
(currently) the entire document including all the stored fields.
Minimizing the documents you access in this way (say displaying 10 or
20 at a time, which is typical) is wise.
I really don't see a need for any custom caching on top of Lucene.
Remember the rule of optimization: don't. And for experts only: don't
do it yet. :)
> It depends on how good and tuneable is "some LRU caching" in Hits. Is
> it soft? Can it take up 2 gigs of ram?
Hits is not tunable. It caches up to 200 documents. Though you can
use Lucene's lower-level search() API methods to do some of your own
magic if you like - look to see how Hits does its thing with the basic
search(Query) method.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|