hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Peterson <kpeter...@biz360.com>
Subject Confusion on directory config (hadoop.tmp.dir)
Date Thu, 24 Sep 2009 01:31:42 GMT
We've had some problems with jobs failing or the cluster not being able to
run jobs when a tasktracker has a full disk, even though dfs.data.dir and
mapred.local.dir point to a list of directories or which at least one has
space. Or it's possible that I incorrectly configured a replacement machine.
I'm trying to go through our config and make sure I have everything set up
reasonably. I'm currently using 0.19.1 and am cleaning up the config in
preparation for moving to 0.20.

I see the following properties that are supposded to point to a list of
directories on the local machine which are used in JBOD:
dfs.data.dir - stores the blocks on each datanode
mapred.local.dir - stores the intermediate data during a job, ? as well as
the job's working directory ?
fs.s3.buffer.dir - buffers partial files before uploading to s3 when using
s3 or s3n file systems

And these take a list of directories on the local machine which are all
written to for redundency:

And these are directories within hdfs:

What I cannot figure out is what is hadoop.tmp.dir used for? All I find
online are reference to this being the default value of most of the above.
Is this directory used for anything itself? How much space do I need to
ensure is available on this file system?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message