hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From raymond <rgbbo...@163.com>
Subject Any way to merge two large hdfs cluster without copy data?
Date Tue, 03 May 2016 07:25:32 GMT

Seems we are doing a lot of cluster migration and merge works….

Now, we have two large hdfs cluster ( each have PBs data), And we need to merge them into
a single one.

I know we can do it by distcp data from one cluster to another,  and decommission nodes from
one cluster and join another cluster one node by one node slowly…. While this will be a
painful process for two large cluster.

So I am looking for a solution to not copy data, but only copy metadata to speed up the process,
We can bring down one cluster( the one to move) for several data, and one cluster (the one
been merged in) for a couple hours. 

I can imaging several possible approachings:

1.  it might be possible to just replay cluster A’s fsimage on Cluster B , then bring down
cluster A’s data node and join them into cluster B. 
2. some how utilize hdfs federation feature to achieve this goal.

From the information I gathered up to now, there are no ready solution and tools for approaching
1, we need to do some dirty hacks and try to not mess up everything.

for approaching 2, federation is not designed for this purpose, and there are very few information
regarding to hdfs federation, and distcp and datanode seems lack of some feature to transfer
data between federated cluster efficiently. So I am not sure whether this approaching can
help or not or how complicate this approaching will be ( say, we need to upgrade cluster to
support federation in the first place , then join another cluster, then do the data transfer,
then possibly remove the federation feature to clean up )

So , anyone do this job before? or any suggestion? ;-)



To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

View raw message