hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize mob compaction with _del files
Date Fri, 31 Mar 2017 17:35:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951360#comment-15951360

huaxiang sun commented on HBASE-17172:

[~jingcheng.du], a late follow-up on this. Grouping delete files by its first/last key is
to avoid including delete files to set of files-to-be-compacted as much as possible. If only
started key is used, there is one case which I am not sure how to handle it (maybe I am following
your idea correctly). 

Let's say, for region 1, it starts with key0, ends at key2. It has one delete file key0***_del.
After that, the region may split to region1-0, region1-1, For region1-1, key0***_del may be
included for compaction as it may contain keys for it.  My understanding is that if we only
use startKey to group files, key0***_del will not be included in region1-1's mob compaction.

Maybe as you said
Since now we have always retained the delete markers in hfiles, 
It is ok not to include the delete file with reigon1-1, data for the delete cells will still
be kept, and they will be bulkloaded after mob compaction, since delete markers are still
in hfiles, they will not show up.

Is my understanding correct? Thanks [~jingcheng.du]!

> Optimize mob compaction with _del files
> ---------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>             Fix For: 2.0.0
>         Attachments: HBASE-17172-master-001.patch, HBASE-17172.master.001.patch, HBASE-17172.master.002.patch,
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

View raw message