hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: FileSystem Caching in Hadoop
Date Wed, 07 Oct 2009 14:48:14 GMT
On Wed, Oct 7, 2009 at 7:45 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

>
> Todd,
>
> I do think it could be an inherent problem. With all the reading and
> writing of intermediate data hadoop does, the file system cache would
> would likely never contain the initial raw data you want to work with.
> The HBase RegionServer seems to be successful, so there must be some
> place for caching.
>
> Once I get something in HDFS, like lasts hours log data, about 40
> different processes are going to repeatedly re/read it from disk. I
> think if i can force that data into a cache I can get much faster
> processing.
>
> In cases like this, we should expose access type hints like posix_fadvise
POSIX_ADV_DONTNEED for the data we dont' want to end up in the cache.
There's already a JIRA out there for a JNI library for platform specific
optimization, and I think this is one that will be worth doing.

-ToddEdward

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message