hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17172) Optimize major mob compaction with _del files
Date Wed, 30 Nov 2016 00:49:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707080#comment-15707080

huaxiang sun commented on HBASE-17172:

Hi [~jingcheng.du] and [~anoop.hbase], just did more code reading and found that _del files
can be included in minor mob compaction when the file size is less than the threshold. Assume
that user sets a high threshold value, even for already compacted-files, it can be included
in the compact list again and be compacted with the del files. If we want to deal with _del
files mainly in major mob compaction. Can we skip these already-compacted files in the minor
compaction? something like in the select() after files are added to filesToCompact map. This
is to speed up minor compaction with del files.

diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
index 33aecc0..dab05d2 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/compactions/PartitionedMobCompactor.java
@@ -25,6 +25,7 @@ import java.util.Collection;
 import java.util.Collections;
 import java.util.Date;
 import java.util.HashMap;
+import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import java.util.Map.Entry;
@@ -179,6 +180,23 @@ public class PartitionedMobCompactor extends MobCompactor {
+    /*
+     * If it is not a major mob compaction with del files, and the file number in Partition
is 1,
+     * remove the partition from filesToCompact list to avoid re-compacting files which has
+     * compacted with del files.
+     */
+    if (!allFiles && (allDelFiles.size() > 0)) {
+      for(Iterator<Map.Entry<CompactionPartitionId, CompactionPartition>> it
+          filesToCompact.entrySet().iterator(); it.hasNext(); ) {
+        Map.Entry<CompactionPartitionId, CompactionPartition> entry = it.next();
+        if (entry.getValue().getFileNumbers() <= 1) {
+          it.remove();
+          --selectedFileCount;
+        }
+      }
+    }
     PartitionedMobCompactionRequest request = new PartitionedMobCompactionRequest(
       filesToCompact.values(), allDelFiles);
     if (candidates.size() == (allDelFiles.size() + selectedFileCount + irrelevantFileCount))

> Optimize major mob compaction with _del files
> ---------------------------------------------
>                 Key: HBASE-17172
>                 URL: https://issues.apache.org/jira/browse/HBASE-17172
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
> Today, when there is a _del file in mobdir, with major mob compaction, every mob file
will be recompacted, this causes lots of IO and slow down major mob compaction (may take months
to finish). This needs to be improved. A few ideas are: 
> 1) Do not compact all _del files into one, instead, compact them based on groups with
startKey as the key. Then use firstKey/startKey to make each mob file to see if the _del file
needs to be included for this partition.
> 2). Based on the timerange of the _del file, compaction for files after that timerange
does not need to include the _del file as these are newer files.

This message was sent by Atlassian JIRA

View raw message