accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From phrocker <...@git.apache.org>
Subject [GitHub] accumulo pull request #106: ACCUMULO-4153: Update the getCodec method to no ...
Date Tue, 07 Jun 2016 13:05:48 GMT
Github user phrocker commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/106#discussion_r66066164
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/bcfile/Compression.java
---
    @@ -78,41 +86,90 @@ public void flush() throws IOException {
       public static final String COMPRESSION_NONE = "none";
     
       /**
    -   * Compression algorithms.
    +   * Compression algorithms. There is a static initializer, below the values defined
in the enumeration, that calls the initializer of all defined codecs within
    +   * the Algorithm enum. This promotes a model of the following call graph of initialization
by the static initializer, followed by calls to getCodec() and
    +   * createCompressionStream/DecompressionStream. In some cases, the compression and
decompression call methods will include a different buffer size for the
    +   * stream. Note that if the compressed buffer size requested in these calls is zero,
we will not set the buffer size for that algorithm. Instead, we will use
    +   * the default within the codec.
    +   *
    +   * The buffer size is configured in the Codec by way of a Hadoop Configuration reference.
One approach may be to use the same Configuration object, but when
    +   * calls are made to createCompressionStream and DecompressionStream, with non default
buffer sizes, the configuration object must be changed. In this case,
    +   * concurrent calls to createCompressionStream and DecompressionStream would mutate
the configuration object beneath each other, requiring synchronization to
    +   * avoid undesirable activity via co-modification. To avoid synchronization entirely,
we will create Codecs with their own Configuration object and cache them
    +   * for re-use. A default codec will be statically created, as mentioned above to ensure
we always have a codec available at loader initialization.
    +   *
    +   * There is a Guava cache defined within Algorithm that allows us to cache Codecs for
re-use. Since they will have their own configuration object and thus do
    +   * not need to be mutable, there is no concern for using them concurrently; however,
the Guava cache exists to ensure a maximal size of the cache and
    +   * efficient and concurrent read/write access to the cache itself.
    +   *
    +   * To provide Algorithm specific details and to describe what is in code:
    +   *
    +   * LZO will always have the default LZO codec because the buffer size is never overridden
within it.
    +   *
    +   * GZ will use the default GZ codec for the compression stream, but can potentially
use a different codec instance for the decompression stream if the
    +   * requested buffer size does not match the default GZ buffer size of 32k.
    +   *
    +   * Snappy will use the default Snappy codec with the default buffer size of 64k for
the compression stream, but will use a cached codec if the buffer size
    +   * differs from the default.
        */
       public static enum Algorithm {
    +
         LZO(COMPRESSION_LZO) {
    -      private transient boolean checked = false;
    +      /**
    +       * determines if we've checked the codec status. ensures we don't recreate the
defualt codec
    +       */
    +      private transient AtomicBoolean checked = new AtomicBoolean(false);
           private static final String defaultClazz = "org.apache.hadoop.io.compress.LzoCodec";
           private transient CompressionCodec codec = null;
     
    +      /**
    +       * Configuration option for LZO buffer size
    +       */
    +      private static final String BUFFER_SIZE_OPT = "io.compression.codec.lzo.buffersize";
    +
    +      /**
    +       * Default buffer size
    +       */
    +      private static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
    +
           @Override
    -      public synchronized boolean isSupported() {
    -        if (!checked) {
    -          checked = true;
    -          String extClazz = (conf.get(CONF_LZO_CLASS) == null ? System.getProperty(CONF_LZO_CLASS)
: null);
    -          String clazz = (extClazz != null) ? extClazz : defaultClazz;
    -          try {
    -            LOG.info("Trying to load Lzo codec class: " + clazz);
    -            codec = (CompressionCodec) ReflectionUtils.newInstance(Class.forName(clazz),
conf);
    -          } catch (ClassNotFoundException e) {
    -            // that is okay
    -          }
    -        }
    +      public boolean isSupported() {
             return codec != null;
           }
     
    +      public void initializeDefaultCodec() {
    +        if (!checked.get()) {
    +          checked.set(true);
    +          codec = createNewCodec(DEFAULT_BUFFER_SIZE);
    +        }
    +      }
    +
           @Override
    -      CompressionCodec getCodec() throws IOException {
    -        if (!isSupported()) {
    -          throw new IOException("LZO codec class not specified. Did you forget to set
property " + CONF_LZO_CLASS + "?");
    +      CompressionCodec createNewCodec(int bufferSize) {
    +        String extClazz = (conf.get(CONF_LZO_CLASS) == null ? System.getProperty(CONF_LZO_CLASS)
: null);
    +        String clazz = (extClazz != null) ? extClazz : defaultClazz;
    +        try {
    +          LOG.info("Trying to load Lzo codec class: " + clazz);
    +          Configuration myConf = new Configuration(conf);
    +          // only use the buffersize if > 0, otherwise we'll use
    +          // the default defined within the codec
    +          if (bufferSize > 0)
    +            myConf.setInt(BUFFER_SIZE_OPT, bufferSize);
    +          codec = (CompressionCodec) ReflectionUtils.newInstance(Class.forName(clazz),
myConf);
    +          return codec;
    +        } catch (ClassNotFoundException e) {
    +          // that is okay
             }
    +        return null;
    +      }
     
    +      @Override
    +      CompressionCodec getCodec() throws IOException {
    --- End diff --
    
    They exist because in the codecs that aren't the default ( gz ), they will throw an IOException
when the isSupported is false and getCodec is called (i.e. you called it even if a check to
isSupported is false. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message