hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: using multiple disks for HDFS
Date Tue, 09 Feb 2010 19:40:34 GMT
Hi Vasilis,

Two things:

1) You're missing a matching } in your hadoop.tmp.dir setting
2) When you use ${hadoop.tmp.dir}/dfs/data, it does a literal string
interpolation. Thus, it's not adding dfs/data to each of the
hadoop.tmp.dir directories, but rather just the last one.

I'd recommend setting dfs.data.dir explicitly to the full comma
separated list and ignoring hadoop.tmp.dir

Thanks
-Todd

On Tue, Feb 9, 2010 at 8:49 AM, Vasilis Liaskovitis <vliaskov@gmail.com> wrote:
> Hi,
>
> I am trying to use 4 SATA disks per node in my hadoop cluster. This is
> a JBOD configuration, no RAID is involved. There is one single xfs
> partition per disk, each one mounted as /local/, /local2/, /local3,
> /local4 - with sufficient privileges for running hadoop jobs. HDFS is
> setup across the 4 disks for a single user usage (user2) with the
> following comma separated list in hadoop.tmp.dir
>
> <property>
>  <name>dfs.data.dir</name>
>  <value>${hadoop.tmp.dir}/dfs/data</value>
> </property>
>
>  <property>
>    <name>hadoop.tmp.dir</name>
>    <value>/local/user2/hdfs/hadoop-${user.name},/local2/user2/hdfs/hadoop-${user.name},/local3/user2/hdfs/hadoop-${user.name,/local4/user2/hdfs/hadoop-${user.name}</value>
>    <description>A base for other temporary directories.</description>
>  </property>
>
> What I see is that most or all data is stored on disks /local and
> /local4 across nodes. Directories local2 and local3 from the other
> disks are not used. I have verified that these disks can be written to
> and have free space.
>
> Isn't HDFS supposed to use all disks in a round-robin way? (provided
> there is free space on all). Do I need to change another config
> parameter for HDFS to spread I/O across all  provided mount points?
>
> - Vasilis
>

Mime
View raw message