hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18693) adding an option to restore_snapshot to move mob files from archive dir to working dir
Date Wed, 30 Aug 2017 17:50:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147677#comment-16147677

huaxiang sun commented on HBASE-18693:

Hi Jingcheng,

Restoring a snapshot to the same table is okay. What if we try to restore the snapshot in
another table? The same MOB file can be in different locations? No, right?

I got what was your concern. restore_snapshot always restores to the same table, that is why
I add an option here. clone_snapshot is a different story, it can be cloned to different tables.
If the option is added to clone_snapshot, it will corrupt the snapshot.

You are right, this is a problem. How about select files with multiple threads, each thread
handle part of the files selection? Thanks.
HBASE-17043 has been created for this effort. I think this is not enough and overhead (pressure
to NN). We need to give user an option in this case.
If this option looks good to you, I am going to post a patch.


> adding an option to restore_snapshot to move mob files from archive dir to working dir
> --------------------------------------------------------------------------------------
>                 Key: HBASE-18693
>                 URL: https://issues.apache.org/jira/browse/HBASE-18693
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0-alpha-2
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
> Today, there is a single mob region where mob files for all user regions are saved. There
could be many files (one million) in a single mob directory. When one mob table is restored
or cloned from snapshot, links are created for these mob files. This creates a scaling issue
for mob compaction. In mob compaction's select() logic, for each hFileLink, it needs to call
NN's getFileStatus() to get the size of the linked hfile. Assume that one such call takes
20ms, 20ms * 1000000 = 6 hours. 
> To avoid this overhead, we want to add an option so that restore_snapshot can move mob
files from archive dir to working dir. clone_snapshot is more complicated as it can clone
a snapshot to a different table so moving that can destroy the snapshot. No option will be
added for clone_snapshot.

This message was sent by Atlassian JIRA

View raw message