hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Selby <mse...@unseelie.name>
Subject hadoop distcp and hbase ExportSnapshot hdfs replication factor question.
Date Thu, 25 Feb 2016 01:16:54 GMT
I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I 
am backing up to a remote data center that has many fewer machines with 
a higher per disk density.

The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.

When I run distcp on the primary cluster specifying the remote are the 
source, and I DO NOT specify preserve replication factor as an argument, 
I still get 3 replicas on the remote.

All my HBase snapshots that are copied from the primary to the backup 
also end up with h-files that have a replication factor of 3.

As a test I ran distcp from the backup pulling from the primary and this 
did result in a replication factor of 2. I have many fewer resources on 
the backup and think that it would be faster to perform the large copy 
with a larger number of machines.

As well I can not pull HBase snapshots from the backup cluster. The 
ExportSnapshot utility does not support this.

Does anyone know if it is possible to distcp to another cluster that has 
a smaller replication factor and have that take effect.


To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

View raw message