hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: adding space on existing datanode ?
Date Mon, 25 Feb 2013 12:15:18 GMT
Hi Brice,

Why are you saying it's incrementing replication? Is any anything
documented anywhere which is leading you to the wrong direction? Bejoy
below right, the replication factor is not changed by the addition of a new
directory under dfs.data.dir. This will "simply" divide the load on this
specific datanode between all the directories you specified.


2013/2/25 <bejoy.hadoop@gmail.com>

> Hi Brice
> By adding a new storage location to dfs.data.dir you are not incrementing
> the replication factor.
> You are giving one mode location for the blocks to be copied for that data
> node.
> There is no new DataNode added. A new data node would be live only if
> tweak your configs and start a new DataNode daemon.
> Regards
> Bejoy KS
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * brice lecomte <blecomte@astek.fr>
> *Date: *Mon, 25 Feb 2013 09:50:29 +0100
> *To: *<user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: adding space on existing datanode ?
> Thanks for your reply, I'm running 1.1.1, hence dfs.data.dir looks to be
> the right property to add, but doing so, it would add another complete
> datanode (incrementing dfs.replication by 1) whereas here, I'd like just to
> "extend" an existing one. Am I wrong ?
> Le 22/02/2013 19:56, Patai Sangbutsarakum a écrit :
> Just want to add up from JM.
>  If you already have balancer run in cluster every day, that will help
> the new drive(s) get balanced.
>  P
>   From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> Reply-To: <user@hadoop.apache.org>
> Date: Fri, 22 Feb 2013 13:14:14 -0500
> To: <user@hadoop.apache.org>
> Subject: Re: adding space on existing datanode ?
>   add disk space to you datanode you simply need to add another drive,
> then add it to the dfs.data.dir or dfs.datanode.data.dir entry. After a
> datanode restart, hadoop will start to use it.
> It will not balance the existing data between the directories. It will
> continue to add to the 2. If one goes full, it will only continue with the
> other one. If required, you can balance the data manually. Or depending on
> your use case and the options you have, you can stop the datanode, delete
> the content of the 2 data directories and restart it. It will stat to
> receive data to duplicate and will share it evenly between the 2
> directories. This last solution is not recommended. But for a test
> environment it might be easier.

View raw message