hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajit Ratnaparkhi <ajit.ratnapar...@gmail.com>
Subject Re: What does hdfs balancer do after adding more disks to existing datanode.
Date Mon, 05 Dec 2011 09:29:46 GMT
Hi,

dfs data directory at a datanode stores blocks in following directory
structure:
All blocks are stored at location:
<dfs.data.dir>/current/

This directory contains some blocks and some subdirectories named like
'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)

To be precise, each directory in directory hierarchy rooted
at <dfs.data.dir>/current/ contains max 64 block (data+metadata) plus max
64 subdirectories (named subdir0 to subdir63).

So my question is, whenever I do a manual block transfer across disks for
load balancing with newly added disks, do I need to take care of
maintaining this constraint of directory hierarchy? or just putting blocks
in <data.dfs.dir>/current/ will work?

thanks,
Ajit.

On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi <
ajit.ratnaparkhi@gmail.com> wrote:

> Thanks Harsh!
>
>
> On Tue, Nov 22, 2011 at 10:05 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> Ajit / Inder,
>>
>> Please see
>> http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
>>
>> On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
>> <ajit.ratnaparkhi@gmail.com> wrote:
>> > Thanks for Help Joey!
>> > Does just copying block files from one drive to another work?
>> > Isn't there metadata maintained at datanode about block locations on
>> that
>> > datanode? If not, then how does datanode know about blocks stored on it?
>> >
>> > -Ajit.
>> > On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria <joey@cloudera.com>
>> wrote:
>> >>
>> >> The balancer only balances between datanodes. This means the new
>> >> drives won't get used until you start writing new data to them. If you
>> >> want to balance the drives on a node, you need to
>> >>
>> >> 1) copy a bunch of block files from the old drives to the new drives
>> >> 2) shutdown the datanode
>> >> 3) delete the old block files
>> >> 4) configure the datanode to see the new drives
>> >> 5) start the datanode
>> >>
>> >> -Joey
>> >>
>> >> On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
>> >> <ajit.ratnaparkhi@gmail.com> wrote:
>> >> > Hi,
>> >> > If I add additional disks to existing datanode (assume existing
>> datanode
>> >> > has
>> >> > 7 1TB disk which are already 80% full and then I add two new 2TB
>> disks
>> >> > 0%
>> >> > full) and then run balancer, does balancer balance data in a
>> datanode?
>> >> > ie.
>> >> > Will it move data from existing disks to newly added disks such that
>> all
>> >> > disks are approx equally full ?
>> >> > thanks,
>> >> > Ajit.
>> >>
>> >>
>> >>
>> >> --
>> >> Joseph Echeverria
>> >> Cloudera, Inc.
>> >> 443.305.9434
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message