hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: Adding new disk to DNs - FAQ #15 clarification
Date Tue, 03 Jun 2008 15:59:08 GMT
I have had problems with multiple volumes while using ancient versions of
Hadoop.  If I put the smaller partition first, I would get overfull
partition because hadoop was allocating by machine rather than by partition.

If you feel energetic, go ahead and try putting the smaller partition first
in the list.  If not, put it second.

If you feel conservative, only use both partitions if they are of roughly
equal size.  Frankly, if one is much bigger than the other, then the smaller
one isn't going to help all that much anyway so you can go with just a
single partition without much loss.

I would very much like to hear if this is an old problem.

On Tue, Jun 3, 2008 at 8:36 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:

> Hi,
>
> I'm about to add a new disk (under a new partition) to some existing
> DataNodes that are nearly full.  I see FAQ #15:
>
> 15. HDFS. How do I set up a hadoop node to use multiple volumes?
> Data-nodes can store blocks in multiple directories typically allocated on
> different local disk drives. In order to setup multiple directories one
> needs to specify a comma separated list of pathnames as a value of the
> configuration parameter  dfs.data.dir. Data-nodes will attempt to place
> equal amount of data in each of the directories.
>
> I think some clarification around "will attempt to place equal amount of
> data in each of the directories" is needed:
>
> * Does that apply only if you have multiple disks in a DN from the
> beginning, and thus Hadoop just tries to write to all of them equally?
> * Or does that apply to situations like mine, where one disk is nearly
> completely full, and then a new, empty disk is added?
>
> Put another way, if I add thew new disk via dfs.data.dir, will Hadoop:
> 1) try to write the same amount of data to both disks from now on, or
> 2) try to write exclusively to the new/empty disk first, in order to get it
> to roughly 95% full?
>
> In my case I'd like to add the new mount point to dfs.data.dir and rely on
> Hadoop realizing that it now has one disk partition that is nearly full, and
> one that is completely empty, and just start writing to the new partition
> until it reaches the equilibrium.  If that's not possible, is there a
> mechanism by which I can tell Hadoop to move some of the data from the old
> partition to the new partition?  Something like a balancer tool, but
> applicable to a single DN with multiple volumes...
>
> Thank you,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>


-- 
ted

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message