hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: What does hdfs balancer do after adding more disks to existing datanode.
Date Mon, 05 Dec 2011 10:00:53 GMT
Ajit,

Just move/merge subdirectories - its the easiest way to go about it and does no harm. For
confidence, you can also fire up a test cluster and test out these things :)

On 05-Dec-2011, at 2:59 PM, Ajit Ratnaparkhi wrote:

> Hi,
> 
> dfs data directory at a datanode stores blocks in following directory structure:
> All blocks are stored at location:
> <dfs.data.dir>/current/
> 
> This directory contains some blocks and some subdirectories named like 'subdir*' (eg.
subdir0, subdir1, ... ,subdir33, ..,subdir63)
> 
> To be precise, each directory in directory hierarchy rooted at <dfs.data.dir>/current/
contains max 64 block (data+metadata) plus max 64 subdirectories (named subdir0 to subdir63).
> 
> So my question is, whenever I do a manual block transfer across disks for load balancing
with newly added disks, do I need to take care of maintaining this constraint of directory
hierarchy? or just putting blocks in <data.dfs.dir>/current/ will work?
> 
> thanks,
> Ajit.
> 
> On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi <ajit.ratnaparkhi@gmail.com>
wrote:
> Thanks Harsh!
> 
> 
> On Tue, Nov 22, 2011 at 10:05 PM, Harsh J <harsh@cloudera.com> wrote:
> Ajit / Inder,
> 
> Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
> 
> On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
> <ajit.ratnaparkhi@gmail.com> wrote:
> > Thanks for Help Joey!
> > Does just copying block files from one drive to another work?
> > Isn't there metadata maintained at datanode about block locations on that
> > datanode? If not, then how does datanode know about blocks stored on it?
> >
> > -Ajit.
> > On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria <joey@cloudera.com> wrote:
> >>
> >> The balancer only balances between datanodes. This means the new
> >> drives won't get used until you start writing new data to them. If you
> >> want to balance the drives on a node, you need to
> >>
> >> 1) copy a bunch of block files from the old drives to the new drives
> >> 2) shutdown the datanode
> >> 3) delete the old block files
> >> 4) configure the datanode to see the new drives
> >> 5) start the datanode
> >>
> >> -Joey
> >>
> >> On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
> >> <ajit.ratnaparkhi@gmail.com> wrote:
> >> > Hi,
> >> > If I add additional disks to existing datanode (assume existing datanode
> >> > has
> >> > 7 1TB disk which are already 80% full and then I add two new 2TB disks
> >> > 0%
> >> > full) and then run balancer, does balancer balance data in a datanode?
> >> > ie.
> >> > Will it move data from existing disks to newly added disks such that all
> >> > disks are approx equally full ?
> >> > thanks,
> >> > Ajit.
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message