hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: using multiple disks for HDFS
Date Tue, 09 Feb 2010 19:40:34 GMT
Hi Vasilis,

Two things:

1) You're missing a matching } in your hadoop.tmp.dir setting
2) When you use ${hadoop.tmp.dir}/dfs/data, it does a literal string
interpolation. Thus, it's not adding dfs/data to each of the
hadoop.tmp.dir directories, but rather just the last one.

I'd recommend setting dfs.data.dir explicitly to the full comma
separated list and ignoring hadoop.tmp.dir


On Tue, Feb 9, 2010 at 8:49 AM, Vasilis Liaskovitis <vliaskov@gmail.com> wrote:
> Hi,
> I am trying to use 4 SATA disks per node in my hadoop cluster. This is
> a JBOD configuration, no RAID is involved. There is one single xfs
> partition per disk, each one mounted as /local/, /local2/, /local3,
> /local4 - with sufficient privileges for running hadoop jobs. HDFS is
> setup across the 4 disks for a single user usage (user2) with the
> following comma separated list in hadoop.tmp.dir
> <property>
>  <name>dfs.data.dir</name>
>  <value>${hadoop.tmp.dir}/dfs/data</value>
> </property>
>  <property>
>    <name>hadoop.tmp.dir</name>
>    <value>/local/user2/hdfs/hadoop-${user.name},/local2/user2/hdfs/hadoop-${user.name},/local3/user2/hdfs/hadoop-${user.name,/local4/user2/hdfs/hadoop-${user.name}</value>
>    <description>A base for other temporary directories.</description>
>  </property>
> What I see is that most or all data is stored on disks /local and
> /local4 across nodes. Directories local2 and local3 from the other
> disks are not used. I have verified that these disks can be written to
> and have free space.
> Isn't HDFS supposed to use all disks in a round-robin way? (provided
> there is free space on all). Do I need to change another config
> parameter for HDFS to spread I/O across all  provided mount points?
> - Vasilis

View raw message