hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@apache.org>
Subject Re: distributed cache exceeding local.cache.size
Date Fri, 01 Apr 2011 21:12:19 GMT

On Apr 1, 2011, at 12:05 PM, Travis Crawford wrote:

> On Thu, Mar 31, 2011 at 3:25 PM, Allen Wittenauer <aw@apache.org> wrote:
>> On Mar 31, 2011, at 11:45 AM, Travis Crawford wrote:
>>> Is anyone familiar with how the distributed cache deals when datasets
>>> larger than the total cache size are referenced? I've disabled the job
>>> that caused this situation but am wondering if I can configure things
>>> more defensively.
>>        I've started building specific file systems on drives to store the map reduce
spill space.  It seems to be the only reliable way to prevent MR from going nuts.  Sure, some
jobs may fail, but that seems to be a better strategy than the alternative.
> Interesting. So for example, say you have 2 disks in a
> DataNode+TaskTracker machine. You'd make two partitions on each disk,
> and expose 4 partitions to the system, then give two partitions (one
> from each disk) to each app?

	Yes, exactly.  

> Is the idea here to prevent runaway jobs from causing DataNode disks
> from filling up, which causes write failures?

	You got it. :)  Plus some side benefits:

		- HDFS now has a fixed quantity, making capacity planning less of a guessing game.
		- Any per-fs kernel buffer caches, structures, etc are now doubled and specific to each
use case.  This opens the door for lots of interesting things...

	The big con is that you'll need to spend some time to figure out what size you really need
for your MR space.  We had the benefit of ZFS allowing us to adjust the value.  On non-pooled
storage systems, this is obviously harder.  FWIW, experience has shown that you'll likely
end up somewhere between 100-200G of space per fs, but YMMV.

	I've thought about trying fs quotas, but hard partitions just seems like it would be better
in the long run.

View raw message