hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: FileSystem Caching in Hadoop
Date Wed, 07 Oct 2009 14:45:29 GMT
On Wed, Oct 7, 2009 at 2:33 AM, Todd Lipcon <todd@cloudera.com> wrote:
> I think this is the wrong angle to go about it - like you mentioned in your
> first post, the Linux file system cache *should* be taking care of this for
> us. That it is not is a fault of the current implementation and not an
> inherent problem.
>
> I think one solution is HDFS-347 - I'm putting the finishing touches on a
> design doc for that JIRA and should have it up in the next day or two.
>
> -Todd
>
> On Tue, Oct 6, 2009 at 5:25 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> On Tue, Oct 6, 2009 at 6:12 PM, Aaron Kimball <aaron@cloudera.com> wrote:
>> > Edward,
>> >
>> > Interesting concept. I imagine that implementing "CachedInputFormat" over
>> > something like memcached would make for the most straightforward
>> > implementation. You could store 64MB chunks in memcached and try to
>> retrieve
>> > them from there, falling back to the filesystem on failure. One obvious
>> > potential drawback of this is that a memcached cluster might store those
>> > blocks on different servers than the file chunks themselves, leading to
>> an
>> > increased number of network transfers during the mapping phase. I don't
>> know
>> > if it's possible to "pin" the objects in memcached to particular nodes;
>> > you'd want to do this for mapper locality reasons.
>> >
>> > I would say, though, that 1 GB out of 8 GB on a datanode is somewhat
>> > ambitious. It's been my observation that people tend to write
>> memory-hungry
>> > mappers. If you've got 8 cores in a node, and 1 GB each have already gone
>> to
>> > the OS, the datanode, and the tasktracker, that leaves only 5 GB for task
>> > processes. Running 6 or 8 map tasks concurrently can easily gobble that
>> up.
>> > On a 16 GB datanode with 8 cores, you might get that much wiggle room
>> > though.
>> >
>> > - Aaron
>> >
>> >
>> > On Tue, Oct 6, 2009 at 8:16 AM, Edward Capriolo <edlinuxguru@gmail.com
>> >wrote:
>> >
>> >> After looking at the HBaseRegionServer and its functionality, I began
>> >> wondering if there is a more general use case for memory caching of
>> >> HDFS blocks/files. In many use cases people wish to store data on
>> >> Hadoop indefinitely, however the last day,last week, last month, data
>> >> is probably the most actively used. For some Hadoop clusters the
>> >> amount of raw new data could be less then the RAM memory in the
>> >> cluster.
>> >>
>> >> Also some data will be used repeatedly, the same source data may be
>> >> used to generate multiple result sets, and those results may be used
>> >> as the input to other processes.
>> >>
>> >> I am thinking an answer could be to dedicate an amount of physical
>> >> memory on each DataNode, or on several dedicated node to a distributed
>> >> memcache like layer. Managing this cache should be straight forward
>> >> since hadoop blocks are pretty much static. (So say for a DataNode
>> >> with 8 GB of memory dedicate 1GB to HadoopCacheServer.) If you had
>> >> 1000 Nodes that cache would be quite large.
>> >>
>> >> Additionally we could create a new file system type cachedhdfs
>> >> implemented as a facade, or possibly implement CachedInputFormat or
>> >> CachedOutputFormat.
>> >>
>> >> I know that the underlying filesystems have cache, but I think Hadoop
>> >> writing intermediate data is going to evict some of the data which
>> >> "should be" semi-permanent.
>> >>
>> >> So has anyone looked into something like this? This was the closest
>> >> thing I found.
>> >>
>> >> http://issues.apache.org/jira/browse/HADOOP-288
>> >>
>> >> My goal here is to keep recent data in memory so that tools like Hive
>> >> can get a big boost on queries for new data.
>> >>
>> >> Does anyone have any ideas?
>> >>
>> >
>>
>> Aaron,
>>
>> Yes 1GB out of 8GB was just an arbitrary value I decided. Remember
>> that 16K of ram did get a man to the moon. :) I am thinking the value
>> would be configurable, say dfs.cache.mb.
>>
>> Also there is the details of cache eviction, or possibly including and
>> excluding paths and files.
>>
>> Other then the InputFormat concept we could plug the cache in directly
>> into the DFSclient. In this way the cache would always end up on the
>> node where the data was. Otherwise the InputFormat will have to manage
>> that which would be a lot of work. I think if we prove the concept we
>> can then follow up and get it more optimized.
>>
>> I am poking around the Hadoop internals to see what options we have.
>> My first implementation I will probably patch some code, run some
>> tests, profile performance.
>>
>

Todd,

I do think it could be an inherent problem. With all the reading and
writing of intermediate data hadoop does, the file system cache would
would likely never contain the initial raw data you want to work with.
The HBase RegionServer seems to be successful, so there must be some
place for caching.

Once I get something in HDFS, like lasts hours log data, about 40
different processes are going to repeatedly re/read it from disk. I
think if i can force that data into a cache I can get much faster
processing.

HDFS-347 sounds great though.

Edward

Mime
View raw message