hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hadoopman <hadoop...@gmail.com>
Subject HDFS Rsync process??
Date Tue, 30 Nov 2010 03:59:22 GMT
We have two Hadoop clusters in two separate buildings.  Both clusters 
are loading the same data from the same sources (the second cluster is 
for DR).

We're looking at how we can recover the primary cluster and catch it 
back up again as new data will continue to feed into the DR cluster.  
It's been suggested we use rsync across the network however my concern 
is the amount of data we would have to copy over would take several days 
(at a minimum) to sync them even with our dual bonded 1 gig network cards.

I'm curious if anyone has come up with a solution short of just loading 
the source logs into HDFS.  Is there a way to even rsync two clusters 
and get them in sync?  Been googling around.  Haven't found anything of 
substances yet.


View raw message