hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Re: Remove one directory from multiple dfs.data.dir, how?
Date Mon, 04 Apr 2011 11:21:07 GMT
Thanks Harsh,

I will give it a go as you suggested.
But I feel it's not convenient in my case. Decommission is for taking down a
node. What I am doing here is taking out a dir. In my case, all I need to do
is copy files in the dir I want to remove to remaining dirs on the node,
isn't it?

Why not hadoop has this functionality?

On Mon, Apr 4, 2011 at 5:05 PM, Harsh Chouraria <harsh@cloudera.com> wrote:

> Hello Elton,
> On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9404@gmail.com> wrote:
> > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I
> > should do to keep data integrity?
> > -Elton
> This can be done using the reliable 'decommission' process, by
> recommissioning them after having reconfigured (multiple nodes may be
> taken down per decommission round this way, but be wary of your
> cluster's actual used data capacity, and your minimum replication
> factors). Read more about the decommission processes here:
> http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command
> and http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission
> You may also have to run a cluster-wide balancer of DNs after the
> entire process is done, to get rid of some skew in the distribution of
> data across them.
> (P.s. As an alternative solution, you may bring down one DataNode at a
> time, reconfigure it individually, and bring it up again; then repeat
> with the next one once NN's fsck reports a healthy situation again (no
> under-replicated blocks). But decommissioning is the guaranteed safe
> way and is easier to do for some bulk of nodes.)
> --
> Harsh J
> Support Engineer, Cloudera

View raw message