hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
Date Fri, 02 Dec 2016 08:35:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714463#comment-15714463
] 

Jingcheng Du commented on HBASE-17172:
--------------------------------------

We design the mob to reduce the IO amplification. The design tries to guarantee the read performance
no matter how many mob files there are. So we can reduce the compacted files (which leads
to too many files) by setting such a threshold. We don't need to limit the number of files
to small to fast the reading. That is why the default threshold is small, and that is why
your compact policy JIRA is so important:)
The threshold is a key to reduce IO amplification, so we don't recommend to set it as a very
large number. Otherwise, mob doesn't have too many differences from storing cells directly
in HBase.

> Optimize major mob compaction with _del files
> ---------------------------------------------
>
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message