hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Configuring Hadoop clusters with multiple PCs, each of which has 2 hard disks (Sata+SSD)
Date Thu, 12 Jul 2012 11:56:33 GMT
Hi Ivangelion,

Replied inline.

On Thu, Jul 12, 2012 at 2:02 PM, Ivangelion <admin@ivangelion.tw> wrote:
> Hi,

Install all hadoop libs on the SATA disk.

> - 1 PC: pure namenode

Configure dfs.name.dir to write to both places, one under SATA disk
and other under SATA, for redundancy (failure tolerance). This is in


> - Other 5 PCs: datanodes (1 of which also serves as secondary namenode)

Configure dfs.data.dir to write to a location on to the SATA disk
(SATA/dfs/data). This is in hdfs-site.xml.
Configure mapred.local.dir to write to a location on the SSD disk
(SSD/mapred/local). This is in mapred-site.xml.

> - Sata disk with bigger size: common HDFS data storage
> - SSD disk with smaller size but faster: temporary data storage when
> processing map reduce jobs or doing data analyzing.

If you limit your MR to use only SSD space, it will get only that much
space to write per mapper. So if a mapper tries to write, or if a
reducer tries to read over 200 GB of data, it may run into space
unavailability issues. To avoid this, configure mapred.local.dir to
use SATA/mapred/local as well, if a problem.

> Is there anything that needs to be modified?

Yes, configure fs.checkpoint.dir to SSD/dfs/namesecondary, for the SNN
to use that. Use the hdfs-site.xml.

After configuring these, you may ignore hadoop.tmp.dir, as it
shouldn't be used for anything else.

Harsh J

View raw message