hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Li <car...@gmail.com>
Subject Re: Question about disk space allocation in hadoop
Date Thu, 01 Jul 2010 07:51:53 GMT
Hi Chris,

Thanks a lot for your knowledge sharing, I'll have a further
investigation and give it a try on my cluster, hope could get a good
solution from them:)

Best Regards,

2010/6/30 Chris Smith <csmithx+hadoop@gmail.com>:
> Some thoughts on how to restrict the temporary data, but I have only
> tried (a) in anger:
> a)    Partition your disks into HDFS and intermediate temp partitions
> of the relevant size.  This gives a fixed separation but is
> difficult/impossible to modify on a busy cluster especially as there
> may be no way of unloading/recovering the data stored in HDFS if you
> make a mistake resizing partitions;
> b)      Implement disk quotas and set relevant hard and soft limits on
> the relevant root directories for intermediate space. This gives you
> the flexibility to change the limits when required but as the limits
> are per user/group some thought may be required as to which user/group
> the limits apply to. There may also be a performance impact?
> You could combine this with setting “dfs.datanode.du.reserved” value
> in $HADOOP_HOME/conf/hdfs-site.xml for limiting HDFS disk usage.
> c)      Implement intermediate data space as a loopback file, see:
> http://wiki.cita.utoronto.ca/mediawiki/index.php/Fake_Fast_Local_Disk
> This example implements a temporary loopback filesystem on a iSCSI
> mounted Lustre filesystem but the principles are the same. There are
> some performance benchmarks linked to in section 3. The intermediate
> temp data space is limited by the size of the loopback file created.
> Chris
> -----Original Message-----
> From: Yu Li [mailto:carp84@gmail.com]
> Sent: 30 June 2010 04:11
> To: common-user@hadoop.apache.org
> Subject: Re: Question about disk space allocation in hadoop
> Hi all,
> Anybody has experience on this? Any Comments/Suggestions would be
> highly appreciated, Thanks.
> Best Regards,
> Carp
> 2010/6/29 Yu Li <carp84@gmail.com>:
>> Hi all,
>> As we all know, machines in hadoop cluster may be both datanode and
>> tasktracker, so one machine may store both MR job intermediate data
>> and HDFS data. My question is: if we have more than one disk per node,
>> say 4 disks, and would like both job intermediate data and HDFS data
>> store into all disks to reduce IO times of each single disk, can we
>> draw a line between space of local FS and HDFS? For example, restrict
>> the intermediate temp data occupy no more than 25% space on each disk?
>> Thanks in advance.
>> Best Regards,
>> Carp

View raw message