hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize mob compaction with _del files
Date Mon, 20 Feb 2017 03:49:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874003#comment-15874003

Jingcheng Du commented on HBASE-17172:

Thanks [~huaxiang] for the patch. Sorry for the late response.
bq. // TODO: is it possible to skip read of most hfiles?
Most of mob files are compacted by the same start key, and lots of different first/last keys
are grouped together. Thus using both first and last keys groups del files is not necessary
and is a waste, right?
Do we have a way to group them only by the start key from the name? Since now we have always
retained the delete markers in hfiles, I think it is okay to do in this way?

> Optimize mob compaction with _del files
> ---------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>             Fix For: 2.0.0
>         Attachments: HBASE-17172-master-001.patch, HBASE-17172.master.001.patch, HBASE-17172.master.002.patch,
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

View raw message