hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: large intermediate outputs
Date Tue, 04 Jan 2011 04:32:36 GMT

On Jan 3, 2011, at 5:11 AM, Debbie Fu wrote:

> I think it will cause a disk fill-up, too. Is there any mechanism in Hadoop
> that handles this situation?

	Not in a way that saves the job.

> If my local disk stores too much chunk data,
> and spare little space for intermediate output, and all nodes are in this
> situation that we can't schedule the task on another node that could have
> the space for intermediate output, so what does the hadoop do ? Does the job
> simply fail?


> Can I set a remote disk in mapred.local.dir?

	You can point it to an NFS mount, but that'd be suicide.

	Best bet is to break the job up into multiple jobs or reduce the input per task depending
upon the situation if using compression as Harsh mentioned is not acceptable.
View raw message