hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh Chouraria <ha...@cloudera.com>
Subject Re: Remove one directory from multiple dfs.data.dir, how?
Date Mon, 04 Apr 2011 11:32:02 GMT
Ah that thought completely slipped my mind! You can definitely merge
the data into another directory definitely (as noted in
But it could be cumbersome to balance one directory amongst all
others. No tool exists for doing this automatically AFAIK.

You're right, decomission could prove costly. I take that suggestion
back (although the simpler version still stands).

On Mon, Apr 4, 2011 at 4:51 PM, elton sky <eltonsky9404@gmail.com> wrote:
> Thanks Harsh,
> I will give it a go as you suggested.
> But I feel it's not convenient in my case. Decommission is for taking down a
> node. What I am doing here is taking out a dir. In my case, all I need to do
> is copy files in the dir I want to remove to remaining dirs on the node,
> isn't it?
> Why not hadoop has this functionality?
> On Mon, Apr 4, 2011 at 5:05 PM, Harsh Chouraria <harsh@cloudera.com> wrote:
>> Hello Elton,
>> On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9404@gmail.com> wrote:
>> > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I
>> > should do to keep data integrity?
>> > -Elton
>> This can be done using the reliable 'decommission' process, by
>> recommissioning them after having reconfigured (multiple nodes may be
>> taken down per decommission round this way, but be wary of your
>> cluster's actual used data capacity, and your minimum replication
>> factors). Read more about the decommission processes here:
>> http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command
>> and http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission
>> You may also have to run a cluster-wide balancer of DNs after the
>> entire process is done, to get rid of some skew in the distribution of
>> data across them.
>> (P.s. As an alternative solution, you may bring down one DataNode at a
>> time, reconfigure it individually, and bring it up again; then repeat
>> with the next one once NN's fsck reports a healthy situation again (no
>> under-replicated blocks). But decommissioning is the guaranteed safe
>> way and is easier to do for some bulk of nodes.)
>> --
>> Harsh J
>> Support Engineer, Cloudera

Harsh J
Support Engineer, Cloudera

View raw message