hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Einspanjer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3451) Cluster migration best practices
Date Tue, 18 Jan 2011 20:53:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983372#action_12983372

Daniel Einspanjer commented on HBASE-3451:

Diffing python script:

> Cluster migration best practices
> --------------------------------
>                 Key: HBASE-3451
>                 URL: https://issues.apache.org/jira/browse/HBASE-3451
>             Project: HBase
>          Issue Type: Brainstorming
>    Affects Versions: 0.20.6, 0.89.20100924
>            Reporter: Daniel Einspanjer
>            Priority: Critical
> Mozilla is currently in the process of trying to migrate our HBase cluster to a new datacenter.
> We have our existing 25 node cluster in our SJC datacenter.  It is serving production
traffic 24/7.  While we can take downtimes, it is very costly and difficult to take them for
more than a few hours in the evening.
> We have two new 30 node clusters in our PHX datacenter.  We are wanting to cut production
over to one of these this week.
> The old cluster is running 0.20.6.  The new clusters are running CDH3b3 with HBase 0.89.
> We have tried running a pull distcp using hftp URLs.  If HBase is running, this causes
SAX XML Parsing exceptions when a directory is removed during the scan.
> If HBase is stopped, it takes hours for the directory compare to finish before it even
begins copying data.
> We have tried a custom backup MR job.  This job uses the map phase to evaluate and copy
changed files. It can run while HBase is live, but that results in a dirty copy of the data.
> We have tried running the custom backup job while HBase is shut down as well.  When we
do this, even on two back to back runs, it still copies over some data and seems to not be
an entirely clean copy.
> When we have gotten what we thought was an entire copy onto the new cluster, we ran add_table
on it, but the resulting hbase table had holes.  Investigating the holes revealed there were
directories that were not transfered.
> We had a meeting to brainstorm ideas and two further suggestions that came up were:
> 1. Build a file list of files to transfer on the SJC side, transfer that file list to
PHX and then run distcp on it.
> 2. Try a full copy instead of incremental, skipping the expensive file compare step
> 3. Evaluate copying from SJC to S3 then from S3 to PHX.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message