hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5509) MR based copier for copying HFiles (trunk version)
Date Tue, 06 Mar 2012 19:51:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223598#comment-13223598
] 

Karthik Ranganathan commented on HBASE-5509:
--------------------------------------------

@Zhihong Yu:
We use this code as the primary means to backup HFiles inside FB. We have done a lot of improvements
to the DFS copy underneath, and they have caused some bugs, but thats unrelated to this code.
Not too many issues, besides tuning the number of mappers to use so that we dont overwhelm
a running system.

@Lars:
You are correct about getStoreFileList() - it is passed from commandline and it is overloaded
for a subset/all CF's. Zhihong - the list versus a comma-separated string is a trivial point
since the list construction has to happen either in the RS or in the caller, so should not
make much of a difference practically.
                
> MR based copier for copying HFiles (trunk version)
> --------------------------------------------------
>
>                 Key: HBASE-5509
>                 URL: https://issues.apache.org/jira/browse/HBASE-5509
>             Project: HBase
>          Issue Type: Sub-task
>          Components: documentation, regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: 5509-v2.txt, 5509.txt
>
>
> This copier is a modification of the distcp tool in HDFS. It does the following:
> 1. List out all the regions in the HBase cluster for the required table
> 2. Write the above out to a file
> 3. Each mapper 
>    3.1 lists all the HFiles for a given region by querying the regionserver
>    3.2 copies all the HFiles
>    3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried
in another loop
> 4. Mappers are placed on nodes which have maximum locality for a given region to speed
up copying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message