hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: migration from hadoop cluster cdh3 to cdh4
Date Fri, 07 Dec 2012 13:41:39 GMT
Hi Shengjie,

This question is specific to CDH and hence does not belong to the Apache
HDFS development lists (Which is for HDFS project developers). I've hence
moved your question to CDH's own user lists cdh-user@cloudera.org (

My answers inline.

On Fri, Dec 7, 2012 at 6:57 PM, Shengjie Min <kelvin.msj@gmail.com> wrote:

> Hi,
> Is there any instructions or documents covering migration from hadoop hdfs
> cdh3 to cdh4 since all the docs I found are talking about in place
> upgrading ONLY?

You are correct that at present there is no migration guide. I'll reach out
to the docs team behind the site to add one in as it may be helpful to
others too.

> I have two hadoop clusters, My target is to use hadoop -cp to copy all the
> hdfs files from *cluster1* to*cluster2*
> *Cluster1:* Hadoop 0.20.2-cdh3u4
> *Cluster2:* Hadoop 2.0.0-cdh4.1.1
> Now, even just running dfs -ls command against *cluster1* remotely on *
> cluster2* as below:
> hadoop fs -ls hdfs://cluster1-namenode:8020/hbase

Using regular FS commands (using hdfs:// Scheme) between CDH3 and CDH4 will
not work as both have different protocol versions (and are incompatible
with one another over regular RPC calls). It is normal to see the exception
you got there when you attempt this.

> I think it's due to the hadoop version difference. In my case, cdh3 cluster
> doesn't have mapred deployed which rules out all the distcp, bhase
> copytable options. And the hbase replication ability is not available on
> cdh3 cluster neither. I am struggling to think of a way to migrate the hdfs
> data from *cluster1* to *cluster2.*
HDFS provides a DistCp tool that lets you do this. It leverages mapreduce
to run in a fast manner, and copies provided paths completely. DistCp can
also leverage the HFTP file system (hftp://) that is exposed by HDFS over
the web server (Simple HTTP based HDFS access)

You can invoke on your CDH4 HDFS cluster the following command for more

$ hadoop distcp

What you may probably need is:

$ hadoop distcp hftp://cdh3-namenode:50070/<path to copy> <destination on

> --
> All the best,
> Shengjie Min

Harsh J

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message