hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Adding new disk to DNs - FAQ #15 clarification
Date Tue, 03 Jun 2008 15:36:41 GMT

I'm about to add a new disk (under a new partition) to some existing DataNodes that are nearly
full.  I see FAQ #15:

15. HDFS. How do I set up a hadoop node to use multiple volumes? 
Data-nodes can store blocks in multiple directories typically allocated on different local
disk drives. In order to setup multiple directories one needs to specify a comma separated
list of pathnames as a value of the configuration parameter  dfs.data.dir. Data-nodes will
attempt to place equal amount of data in each of the directories. 

I think some clarification around "will attempt to place equal amount of data in each of the
directories" is needed:

* Does that apply only if you have multiple disks in a DN from the beginning, and thus Hadoop
just tries to write to all of them equally?
* Or does that apply to situations like mine, where one disk is nearly completely full, and
then a new, empty disk is added?

Put another way, if I add thew new disk via dfs.data.dir, will Hadoop:
1) try to write the same amount of data to both disks from now on, or
2) try to write exclusively to the new/empty disk first, in order to get it to roughly 95%

In my case I'd like to add the new mount point to dfs.data.dir and rely on Hadoop realizing
that it now has one disk partition that is nearly full, and one that is completely empty,
and just start writing to the new partition until it reaches the equilibrium.  If that's not
possible, is there a mechanism by which I can tell Hadoop to move some of the data from the
old partition to the new partition?  Something like a balancer tool, but applicable to a single
DN with multiple volumes...

Thank you,
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

View raw message