hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Creating Lucene index in Hadoop
Date Tue, 17 Mar 2009 18:31:12 GMT
Ning Li wrote:
> 1 is good. But for 2:
>   - Won't it have a security concern as well? Or is this not a general
> local cache?

A client-side RAM cache would be filled through the same security 
mechanisms as all other filesystem accesses.

>   - You are referring to caching in RAM, not caching in local FS,
> right? In general, a Lucene index size could be quite large. We may
> have to cache a lot of data to reach a reasonable hit ratio...

Lucene on a local disk benefits significantly from the local 
filesystem's RAM cache (aka the kernel's buffer cache).  HDFS has no 
such local RAM cache outside of the stream's buffer.  The cache would 
need to be no larger than the kernel's buffer cache to get an equivalent 
hit ratio.  And if you're accessing a remote index then you shouldn't 
also need a large buffer cache.

Doug

Mime
View raw message