accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ktur...@apache.org
Subject accumulo git commit: ACCUMULO-1787: created TwoTierCompactionStrategy and its Test
Date Wed, 07 Sep 2016 18:12:14 GMT
Repository: accumulo
Updated Branches:
  refs/heads/master 24432a899 -> f41181190


ACCUMULO-1787: created TwoTierCompactionStrategy and its Test

closes apache/accumulo#135


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/f4118119
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/f4118119
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/f4118119

Branch: refs/heads/master
Commit: f411811902f696f5f5a7124a6ea978c3b3c2483d
Parents: 24432a8
Author: milleruntime <michaelpmiller@gmail.com>
Authored: Wed Jul 27 14:46:29 2016 -0400
Committer: Keith Turner <kturner@apache.org>
Committed: Wed Sep 7 13:39:51 2016 -0400

----------------------------------------------------------------------
 .../asciidoc/chapters/table_configuration.txt   |  22 ++++
 .../examples/README.compactionStrategy          |  65 ++++++++++
 .../compaction/TwoTierCompactionStrategy.java   | 114 +++++++++++++++++
 .../TwoTierCompactionStrategyTest.java          | 127 +++++++++++++++++++
 4 files changed, 328 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/f4118119/docs/src/main/asciidoc/chapters/table_configuration.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/table_configuration.txt b/docs/src/main/asciidoc/chapters/table_configuration.txt
index 5c62ccf..28075e2 100644
--- a/docs/src/main/asciidoc/chapters/table_configuration.txt
+++ b/docs/src/main/asciidoc/chapters/table_configuration.txt
@@ -454,6 +454,28 @@ table. In 1.4 the ability to compact a range of a table was added. To
use this
 feature specify start and stop rows for the compact command. This will only
 compact tablets that overlap the given row range.
 
+==== Compaction Strategies
+
+The default behavior of major compactions is defined in the class DefaultCompactionStrategy.

+This behavior can be changed by overriding the following property with a fully qualified
class name:
+
+  table.majc.compaction.strategy
+
+Custom compaction strategies can have additional properties that are specified following
the prefix property:
+
+  table.majc.compaction.strategy.opts.*
+
+Accumulo provides a few classes that can be used as an alternative compaction strategy. These
classes are located in the 
+org.apache.accumulo.tserver.compaction.* package. EverythingCompactionStrategy will simply
compact all files. This is the 
+strategy used by the user "compact" command. SizeLimitCompactionStrategy compacts files no
bigger than the limit set in the
+property table.majc.compaction.strategy.opts.sizeLimit. 
+
+TwoTierCompactionStrategy is a hybrid compaction strategy that supports two types of compression.
If the total size of 
+files being compacted is larger than table.majc.compaction.strategy.opts.file.large.compress.threshold
than a larger 
+compression type will be used. The larger compression type is specified in table.majc.compaction.strategy.opts.file.large.compress.type.

+Otherwise, the configured table compression will be used. To use this strategy with minor
compactions set table.file.compress.type=snappy 
+and set a different compress type in table.majc.compaction.strategy.opts.file.large.compress.type
for larger files.
+
 === Pre-splitting tables
 
 Accumulo will balance and distribute tables across servers. Before a

http://git-wip-us.apache.org/repos/asf/accumulo/blob/f4118119/docs/src/main/resources/examples/README.compactionStrategy
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/examples/README.compactionStrategy b/docs/src/main/resources/examples/README.compactionStrategy
new file mode 100644
index 0000000..344080b
--- /dev/null
+++ b/docs/src/main/resources/examples/README.compactionStrategy
@@ -0,0 +1,65 @@
+Title: Apache Accumulo Customizing the Compaction Strategy 
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This tutorial uses the following Java classes, which can be found in org.apache.accumulo.tserver.compaction:

+
+ * DefaultCompactionStrategy.java - determines which files to compact based on table.compaction.major.ratio
and table.file.max
+ * EverythingCompactionStrategy.java - compacts all files
+ * SizeLimitCompactionStrategy.java - compacts files no bigger than table.majc.compaction.strategy.opts.sizeLimit
+ * TwoTierCompactionStrategy.java - uses default compression for smaller files and table.majc.compaction.strategy.opts.file.large.compress.type
for larger files
+
+This is an example of how to configure a compaction strategy. By default Accumulo will always
use the DefaultCompactionStrategy, unless 
+these steps are taken to change the configuration.  Use the strategy and settings that best
fits your Accumulo setup. This example shows
+how to configure and test one of the more complicated strategies, the TwoTierCompactionStrategy.
Note that this example requires hadoop
+native libraries built with snappy in order to use snappy compression.
+
+To begin, run the command to create a table for testing:
+
+    $ ./bin/accumulo shell -u root -p secret -e "createtable test1"
+
+The command below sets the compression for smaller files and minor compactions for that table.
+
+    $ ./bin/accumulo shell -u root -p secret -e "config -s table.file.compress.type=snappy
-t test1"
+
+The commands below will configure the TwoTierCompactionStrategy to use gz compression for
files larger than 1M. 
+
+    $ ./bin/accumulo shell -u root -p secret -e "config -s table.majc.compaction.strategy.opts.file.large.compress.threshold=1M
-t test1"
+    $ ./bin/accumulo shell -u root -p secret -e "config -s table.majc.compaction.strategy.opts.file.large.compress.type=gz
-t test1"
+    $ ./bin/accumulo shell -u root -p secret -e "config -s table.majc.compaction.strategy=org.apache.accumulo.tserver.compaction.TwoTierCompactionStrategy
-t test1"
+
+Generate some data and files in order to test the strategy:
+
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i
instance17 -z localhost:2181 -u root -p secret -t test1 --start 0 --num 10000 --size 50 --batchMemory
20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i
instance17 -z localhost:2181 -u root -p secret -t test1 --start 0 --num 11000 --size 50 --batchMemory
20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i
instance17 -z localhost:2181 -u root -p secret -t test1 --start 0 --num 12000 --size 50 --batchMemory
20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i
instance17 -z localhost:2181 -u root -p secret -t test1 --start 0 --num 13000 --size 50 --batchMemory
20M --batchLatency 500 --batchThreads 20
+    $ ./bin/accumulo shell -u root -p secret -e "flush -t test1"
+
+View the tserver log in <accumulo_home>/logs for the compaction and find the name of
the <rfile> that was compacted for your table. Print info about this file using the
PrintInfo tool:
+
+    $ ./bin/accumulo rfile-info <rfile>
+
+Details about the rfile will be printed and the compression type should match the type used
in the compaction...
+Meta block     : RFile.index
+      Raw size             : 512 bytes
+      Compressed size      : 278 bytes
+      Compression type     : gz
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/f4118119/server/tserver/src/main/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategy.java
----------------------------------------------------------------------
diff --git a/server/tserver/src/main/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategy.java
b/server/tserver/src/main/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategy.java
new file mode 100644
index 0000000..a3877b0
--- /dev/null
+++ b/server/tserver/src/main/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategy.java
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.tserver.compaction;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.server.fs.FileRef;
+import org.apache.log4j.Logger;
+
+/**
+ * A hybrid compaction strategy that supports two types of compression. If total size of
files being compacted is larger than
+ * <tt>table.majc.compaction.strategy.opts.file.large.compress.threshold</tt>
than the larger compression type will be used. The larger compression type is
+ * specified in <tt>table.majc.compaction.strategy.opts.file.large.compress.type</tt>.
Otherwise, the configured table compression will be used.
+ *
+ * NOTE: To use this strategy with Minor Compactions set <tt>table.file.compress.type=snappy</tt>
and set a different compress type in
+ * <tt>table.majc.compaction.strategy.opts.file.large.compress.type</tt> for
larger files.
+ */
+public class TwoTierCompactionStrategy extends DefaultCompactionStrategy {
+  private final Logger log = Logger.getLogger(TwoTierCompactionStrategy.class);
+  /**
+   * Threshold memory in bytes. Files larger than this threshold will use <tt>table.majc.compaction.strategy.opts.file.large.compress.type</tt>
for compression
+   */
+  public static final String LARGE_FILE_COMPRESSION_THRESHOLD = "file.large.compress.threshold";
+  private Long largeFileCompressionThreshold;
+
+  /**
+   * Type of compression to use if large threshold is surpassed. One of "gz","lzo","snappy",
or "none"
+   */
+  public static final String LARGE_FILE_COMPRESSION_TYPE = "file.large.compress.type";
+  private String largeFileCompressionType;
+
+  /**
+   * Helper method to check for required table properties.
+   *
+   * @param objectsToVerify
+   *          any objects that shouldn't be null
+   * @throws IllegalArgumentException
+   *           if any object in {@code objectsToVerify} is null
+   *
+   */
+  public void verifyRequiredProperties(Object... objectsToVerify) throws IllegalArgumentException
{
+    for (Object obj : objectsToVerify) {
+      if (obj == null) {
+        throw new IllegalArgumentException("Missing required " + Property.TABLE_COMPACTION_STRATEGY_PREFIX
+ " (" + LARGE_FILE_COMPRESSION_TYPE + " and/or "
+            + LARGE_FILE_COMPRESSION_THRESHOLD + ") for " + this.getClass().getName());
+      }
+    }
+  }
+
+  /**
+   * Calculates the total size of input files in the compaction plan
+   */
+  private Long calculateTotalSize(MajorCompactionRequest request, CompactionPlan plan) {
+    long totalSize = 0;
+    Map<FileRef,DataFileValue> allFiles = request.getFiles();
+    for (FileRef fileRef : plan.inputFiles) {
+      totalSize += allFiles.get(fileRef).getSize();
+    }
+    return totalSize;
+  }
+
+  @Override
+  public void init(Map<String,String> options) {
+    String threshold = options.get(LARGE_FILE_COMPRESSION_THRESHOLD);
+    largeFileCompressionType = options.get(LARGE_FILE_COMPRESSION_TYPE);
+    verifyRequiredProperties(threshold, largeFileCompressionType);
+    largeFileCompressionThreshold = AccumuloConfiguration.getMemoryInBytes(threshold);
+  }
+
+  @Override
+  public boolean shouldCompact(MajorCompactionRequest request) {
+    return super.shouldCompact(request);
+  }
+
+  @Override
+  public void gatherInformation(MajorCompactionRequest request) throws IOException {
+    super.gatherInformation(request);
+  }
+
+  @Override
+  public CompactionPlan getCompactionPlan(MajorCompactionRequest request) {
+    CompactionPlan plan = super.getCompactionPlan(request);
+    plan.writeParameters = new WriteParameters();
+    Long totalSize = calculateTotalSize(request, plan);
+
+    if (totalSize > largeFileCompressionThreshold) {
+      if (log.isDebugEnabled()) {
+        log.debug("Changed compressType to " + largeFileCompressionType + ": totalSize("
+ totalSize + ") was greater than threshold "
+            + largeFileCompressionThreshold);
+      }
+      plan.writeParameters.setCompressType(largeFileCompressionType);
+    }
+    return plan;
+  }
+
+}

http://git-wip-us.apache.org/repos/asf/accumulo/blob/f4118119/server/tserver/src/test/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategyTest.java
----------------------------------------------------------------------
diff --git a/server/tserver/src/test/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategyTest.java
b/server/tserver/src/test/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategyTest.java
new file mode 100644
index 0000000..6fb37da
--- /dev/null
+++ b/server/tserver/src/test/java/org/apache/accumulo/tserver/compaction/TwoTierCompactionStrategyTest.java
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.accumulo.tserver.compaction;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.data.impl.KeyExtent;
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.server.fs.FileRef;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+/**
+ * Tests org.apache.accumulo.tserver.compaction.TwoTierCompactionStrategy
+ */
+public class TwoTierCompactionStrategyTest {
+  private String largeCompressionType = "gz";
+  private TwoTierCompactionStrategy ttcs = null;
+  private MajorCompactionRequest mcr = null;
+  private AccumuloConfiguration conf = null;
+  private HashMap<String,String> opts = new HashMap<>();
+
+  private Map<FileRef,DataFileValue> createFileMap(String... sa) {
+
+    HashMap<FileRef,DataFileValue> ret = new HashMap<>();
+    for (int i = 0; i < sa.length; i += 2) {
+      ret.put(new FileRef("hdfs://nn1/accumulo/tables/5/t-0001/" + sa[i]), new DataFileValue(AccumuloConfiguration.getMemoryInBytes(sa[i
+ 1]), 1));
+    }
+
+    return ret;
+  }
+
+  @Before
+  public void setup() {
+    opts.put(TwoTierCompactionStrategy.LARGE_FILE_COMPRESSION_TYPE, largeCompressionType);
+    opts.put(TwoTierCompactionStrategy.LARGE_FILE_COMPRESSION_THRESHOLD, "500M");
+    ttcs = new TwoTierCompactionStrategy();
+  }
+
+  @Test
+  public void testDefaultCompaction() throws IOException {
+    ttcs.init(opts);
+    conf = AccumuloConfiguration.getDefaultConfiguration();
+    KeyExtent ke = new KeyExtent("0", null, null);
+    mcr = new MajorCompactionRequest(ke, MajorCompactionReason.NORMAL, null, conf);
+    Map<FileRef,DataFileValue> fileMap = createFileMap("f1", "10M", "f2", "10M", "f3",
"10M", "f4", "10M", "f5", "100M", "f6", "100M", "f7", "100M", "f8",
+        "100M");
+    mcr.setFiles(fileMap);
+
+    Assert.assertTrue(ttcs.shouldCompact(mcr));
+    Assert.assertEquals(8, mcr.getFiles().size());
+
+    List<FileRef> filesToCompact = ttcs.getCompactionPlan(mcr).inputFiles;
+    Assert.assertEquals(fileMap.keySet(), new HashSet<>(filesToCompact));
+    Assert.assertEquals(8, filesToCompact.size());
+    Assert.assertEquals(null, ttcs.getCompactionPlan(mcr).writeParameters.getCompressType());
+  }
+
+  @Test
+  public void testLargeCompaction() throws IOException {
+    ttcs.init(opts);
+    conf = AccumuloConfiguration.getDefaultConfiguration();
+    KeyExtent ke = new KeyExtent("0", null, null);
+    mcr = new MajorCompactionRequest(ke, MajorCompactionReason.NORMAL, null, conf);
+    Map<FileRef,DataFileValue> fileMap = createFileMap("f1", "2G", "f2", "2G", "f3",
"2G", "f4", "2G");
+    mcr.setFiles(fileMap);
+
+    Assert.assertTrue(ttcs.shouldCompact(mcr));
+    Assert.assertEquals(4, mcr.getFiles().size());
+
+    List<FileRef> filesToCompact = ttcs.getCompactionPlan(mcr).inputFiles;
+    Assert.assertEquals(fileMap.keySet(), new HashSet<>(filesToCompact));
+    Assert.assertEquals(4, filesToCompact.size());
+    Assert.assertEquals(largeCompressionType, ttcs.getCompactionPlan(mcr).writeParameters.getCompressType());
+  }
+
+  @Test
+  public void testMissingConfigProperties() {
+    try {
+      opts.clear();
+      ttcs.init(opts);
+      Assert.assertTrue("IllegalArgumentException should have been thrown.", false);
+    } catch (IllegalArgumentException iae) {} catch (Throwable t) {
+      Assert.assertTrue("IllegalArgumentException should have been thrown.", false);
+    }
+  }
+
+  @Test
+  public void testFileSubsetCompaction() throws IOException {
+    ttcs.init(opts);
+    conf = AccumuloConfiguration.getDefaultConfiguration();
+    KeyExtent ke = new KeyExtent("0", null, null);
+    mcr = new MajorCompactionRequest(ke, MajorCompactionReason.NORMAL, null, conf);
+    Map<FileRef,DataFileValue> fileMap = createFileMap("f1", "1G", "f2", "10M", "f3",
"10M", "f4", "10M", "f5", "10M", "f6", "10M", "f7", "10M");
+    Map<FileRef,DataFileValue> filesToCompactMap = createFileMap("f2", "10M", "f3",
"10M", "f4", "10M", "f5", "10M", "f6", "10M", "f7", "10M");
+    mcr.setFiles(fileMap);
+
+    Assert.assertTrue(ttcs.shouldCompact(mcr));
+    Assert.assertEquals(7, mcr.getFiles().size());
+
+    List<FileRef> filesToCompact = ttcs.getCompactionPlan(mcr).inputFiles;
+    Assert.assertEquals(filesToCompactMap.keySet(), new HashSet<>(filesToCompact));
+    Assert.assertEquals(6, filesToCompact.size());
+    Assert.assertEquals(null, ttcs.getCompactionPlan(mcr).writeParameters.getCompressType());
+  }
+
+}


Mime
View raw message