hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakuma...@huawei.com>
Subject RE: How to set "hadoop.tmp.dir" if I have multiple disks per node?
Date Mon, 16 Dec 2013 09:27:51 GMT

hadoop.tmp.dir is not the exact configuration you are looking for spreading the disk I/O

This is the default base directory ( its single directory not multiple) used in case you didn’t
configure your own directories for processes such as NameNode, DataNode and NodeManager.

Exact configurations where you need to configure comma separated values are as follows.

1.       dfs.namenode.name.dir for  namenode in hdfs-site.xml

2.       dfs.datanode.data.dir for datanode in hdfs-site.xml

3.       yarn.nodemanager.local-dirs for NodeManager in yarn-site.xml

Please note all above configurations are for Hadoop 2.x

Configure different subdirectories if you are using same disk for multiple processes.
                Ex: /hadoop/data1/dfs/data

Vinayakumar B
From: Tao Xiao [mailto:xiaotao.cs.nju@gmail.com]
Sent: 16 December 2013 14:42
To: user@hadoop.apache.org
Subject: Re: How to set "hadoop.tmp.dir" if I have multiple disks per node?


In order to spread I/O among multiple disks, should I assign a comma-separated list of directories
which are located on different disks to "hadoop.tmp.dir"?
for example,

2013/12/16 Shekhar Sharma <shekhar2581@gmail.com<mailto:shekhar2581@gmail.com>>
hadoop.tmp.dir is a directory created on local file system
For example if you have set hadoop.tmp.dir property to /home/training/hadoop

This directory will be created when you format the namenode by running
the command
hadoop namenode -format

When you open this folder

you will see two subfolders dfs and mapred.

the /home/training/hadoop/mapred folder will be on HDFS also

Hope this clears
Som Shekhar Sharma

On Mon, Dec 16, 2013 at 1:42 PM, Dieter De Witte <drdwitte@gmail.com<mailto:drdwitte@gmail.com>>
> Hi,
> Make sure to also set mapred.local.dir to the same set of output
> directories, this is were the intermediate key-value pairs are stored!
> Regards, Dieter
> 2013/12/16 Tao Xiao <xiaotao.cs.nju@gmail.com<mailto:xiaotao.cs.nju@gmail.com>>
>> I have ten disks per node,and I don't know what value I should set to
>> "hadoop.tmp.dir". Some said this property refers to a location in local disk
>> while some other said it refers to a directory in HDFS. I'm confused, who
>> can explain it ?
>> I want to spread I/O since I have ten disks per node, so should I set a
>> comma-separated list of directories (which are on different disks) to
>> "hadoop.tmp.dir" ?

View raw message