hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Problem : data distribution is non uniform between two different disks on datanode.
Date Tue, 17 Mar 2009 17:02:46 GMT

Are you stopping and starting data nodes often? Are your files small on
average?  What Hadoop version?

It looks like on startup The datanode chooses the first volume to use for
the first block it writes and is round-robin from there.

Are you simply adding the extra disk and changing the config?  Or were both
mounts there from the start?  It should not fail until both are full either
way.


The only improvements I see in the trunk (inner class FSVolumeSet in
FSDataset.java) are:
* Initialize the current volume to a random index in the constructor rather
than the first one.
* Rather than choose by round-robin, weight the choice by free space
available.  This does not have to check all disks' free space each time, it
can remember the values of all volumes and only update the free space of the
current one under consideration during the check it currently does.




On 3/16/09 5:19 AM, "Vaibhav J" <vaibhavj@rediff.co.in> wrote:

> 
> 
> 
> 
>   _____ 
> 
> From: Vaibhav J [mailto:vaibhavj@rediff.co.in]
> Sent: Monday, March 16, 2009 5:46 PM
> To: 'nutch-dev@lucene.apache.org'; 'nutch-user@lucene.apache.org'
> Subject: Problem : data distribution is non uniform between two different
> disks on datanode.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> We have 27 datanode and replication factor is 1. (data size is ~6.75 TB)
> 
> We have specified two different disks for dfs data directory on each
> datanode by using
> 
> property dfs.data.dir in hadoop-site.xml file of conf directory.
> 
> (value of property dfs.data.dir : /mnt/hadoop-dfs/data,
> /mnt2/hadoop-dfs/data)
> 
> 
> 
> when we are setting replication factor 2 then data distribution is biased to
> first disk,
> 
> more data is coping on /mnt/hadoop-dfs/data and after copying some
> data...first disk becomes full
> 
> and showing no available space on disk while we have enough space on second
> disk (/mnt2/hadoop-dfs/data ).
> 
> so, it is difficult to achieve replication factor 2.
> 
> 
> 
> Data traffic is coming on second disk also (/mnt2/hadoop-dfs/data) but it
> looks that
> 
> more data is copied on fisrt disk (/mnt/hadoop-dfs/data).
> 
> 
> 
> 
> 
> What should we do to get uniform data distribution between two different
> disks on
> 
> each datanode to achieve replication factor 2?
> 
> 
> 
> 
> 
> Regards
> 
> Vaibhav J.
> 
> 


Mime
View raw message