hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: mapred.local.dir options
Date Mon, 16 Nov 2009 17:00:10 GMT
Steve,

I ran into something with this parameter that is troublesome.

If I remember correctly mapred.local.dir used by both TaskTracker and
JobTracker.

This was a subtle problem for me because I am not able to share my
hadoop-site.xml between these nodes.

For example, right now I am sharing my configuration so I did not have
to branch it. This makes it impossible for me to do:

-Dmapred.job.tracker=local

Because each datanode does not have the same directory structure as my
JobTracker. Should mapred.local.dir be split into two separate
variables?

Sorry to be off topic, as to your question. Right now I am finger
crossing. I figure if my datanodes get up to 90%+ I dropped the ball.

>>* use separate partition(s) for mapred.local.dir
Makes a lot of sense

>>* Set really high mapred.local.dir.minspacestart and
Makes great sense as well.

Edward
On Mon, Nov 16, 2009 at 11:48 AM, Steve Loughran <stevel@apache.org> wrote:
>
> I see that the mapred.local.dir is served up round robin, as with the
> dfs.data.dir values. But there's no awareness of the possibility that the
> same disk partition is used for mapred local data and for datanode blocks.
>
> What do people do here?
>
> * keep their fingers crossed that if the MR job creates too much data, it
> doesn't interfere with the datanodes
> * use separate partition(s) for mapred.local.dir
> * Set really high mapred.local.dir.minspacestart and
> mapred.local.dir.minspacekill values, 10s of GB, so MR jobs can't normally
> come close to causing the partitions to run out of space
>
> I'll put whatever comes down as best practise up on the
> http://wiki.apache.org/hadoop/DiskSetup page
>

Mime
View raw message