hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Einspanjer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3451) Cluster migration best practices
Date Tue, 18 Jan 2011 17:05:46 GMT
Cluster migration best practices

                 Key: HBASE-3451
                 URL: https://issues.apache.org/jira/browse/HBASE-3451
             Project: HBase
          Issue Type: Brainstorming
    Affects Versions: 0.89.20100924, 0.20.6
            Reporter: Daniel Einspanjer
            Priority: Critical

Mozilla is currently in the process of trying to migrate our HBase cluster to a new datacenter.

We have our existing 25 node cluster in our SJC datacenter.  It is serving production traffic
24/7.  While we can take downtimes, it is very costly and difficult to take them for more
than a few hours in the evening.

We have two new 30 node clusters in our PHX datacenter.  We are wanting to cut production
over to one of these this week.

The old cluster is running 0.20.6.  The new clusters are running CDH3b3 with HBase 0.89.

We have tried running a pull distcp using hftp URLs.  If HBase is running, this causes SAX
XML Parsing exceptions when a directory is removed during the scan.
If HBase is stopped, it takes hours for the directory compare to finish before it even begins
copying data.

We have tried a custom backup MR job.  This job uses the map phase to evaluate and copy changed
files. It can run while HBase is live, but that results in a dirty copy of the data.

We have tried running the custom backup job while HBase is shut down as well.  When we do
this, even on two back to back runs, it still copies over some data and seems to not be an
entirely clean copy.

When we have gotten what we thought was an entire copy onto the new cluster, we ran add_table
on it, but the resulting hbase table had holes.  Investigating the holes revealed there were
directories that were not transfered.

We had a meeting to brainstorm ideas and two further suggestions that came up were:
1. Build a file list of files to transfer on the SJC side, transfer that file list to PHX
and then run distcp on it.
2. Try a full copy instead of incremental, skipping the expensive file compare step
3. Evaluate copying from SJC to S3 then from S3 to PHX.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message