hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
Date Fri, 02 Dec 2016 05:40:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714142#comment-15714142

huaxiang sun commented on HBASE-17172:

Thanks Jingcheng. Regarding with "If we skip the compacted files, the threshold is not that
useful anymore.", today if there is only one file in the partition, and there is no _del files,
the file is skipped. With del file, the current logic is to compact the already-compacted
file with _del file. Let's say there is one mob file regionA20161101****, which was compacted.
On 12/1/2016, there is  _del file regionB20161201****_del, mob compaction kicks in, regionA20161101****
is less than the threshold, and it is picked for compaction. Since there is a _del file, regionA20161101****
and regionB20161201****_del are compacted into regionA20161101****_1 . After that, regionB20161201****_del
cannot be deleted since it is not a allFile compaction. The next mob compaction, regionA20161101****_1
and regionB20161201****_del  will be picked up again and be compacted into regionA20161101****_2.
So in this case, it will cause more unnecessary IOs. Could you double confirm if this is the

> Optimize major mob compaction with _del files
> ---------------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

View raw message