hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
Date Thu, 01 Dec 2016 09:54:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711517#comment-15711517

Jingcheng Du commented on HBASE-17172:

Thanks [~huaxiang].
If we skip the compacted files, the threshold is not that useful anymore. I have three options
for the solution.
One is to decrease the threshold, and use the compaction policy in HBASE-16981 in the compaction.
The second one is we can skip the minor compaction if there is only one mob file (or two mob
files) and one _del file. But we have to suffer the unnecessary compaction in major compaction
( although the major compaction is not recommended).
The last one is we group the _del files by regions, but this is very difficult to allign the
keys in _del files and the partitions in mob files.

> Optimize major mob compaction with _del files
> ---------------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

View raw message