accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Vines (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-351) Add support for LZ4 compression
Date Wed, 04 Sep 2013 23:50:51 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758529#comment-13758529
] 

John Vines commented on ACCUMULO-351:
-------------------------------------

As a quick test to see if this is still worthwhile, I made a 574M Rfile with the following
code-
{code}
CachableBlockFile.Writer _cbw = new CachableBlockFile.Writer(FileSystem.getLocal(new Configuration()).create(new
Path("/tmp/bigTest.rf"), false, 4096,
        (short) -1, 1 << 26), "none", new Configuration());
    Writer writer = new RFile.Writer(_cbw, (int) 100 * 1024, (int) 128 * 1024);
    
    Random r = new Random();
    byte[] colfb, colqb, value;
    colfb = new byte[128];
    colqb = new byte[128];
    value = new byte[128];
    
    String colf, colq;
    Value val = new Value();
    writer.startDefaultLocalityGroup();
    for (int i = 0; i < 1000000; i++) {
      r.nextBytes(colfb);
      r.nextBytes(colqb);
      colf = new String(colfb);
      colq = new String(colqb);
      Key k = new Key(String.format("%128d", i), colf, colq);
      
      r.nextBytes(value);
      val.set(value);
      writer.append(k, val);
    }
    
    writer.close();
  }
{code}

So these are uncompressed RFiles.

I then tried a few different compressions to compare it easily.
Gzip - 265M compressed (2.166 ratio), compression time 50.79s, decompression time 4.57s
lz4 fast compression - 435M compressed (1.319 ratio), compression time 1.98s, decompression
time 0.41s
lz4 high compression - 352M compressed (1.630 ratio), compression time 29.66s, decompression
time 0.32s
lzo default compression - 398M compressed (1.442 ratio), compression time 2.24s, decompression
time 1.36s
lzo fast compression - 400M compressed (1.435 ratio), compression time 2.12s, decompression
time 0.21s
Snappy - 418M compressed (1.373 ratio), compression time 4.06s, decompression time 2.18s

Compared the others, the least compression ratio for starters. At the fastest, it compresses
a negligable amount faster then lzo but decompresses at almost double, but it's in a low resolution
area so that may not be accurate. All in all, I say it's negligable enough that I'm not going
to bother, but it would be a good exercise for a first time contributor.
                
> Add support for LZ4 compression
> -------------------------------
>
>                 Key: ACCUMULO-351
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-351
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: John Vines
>            Assignee: John Vines
>             Fix For: 1.6.0
>
>
> LZ4 is like LZ0, but with better decompression rates and it's BSD license, which means
we can incorporate it in svn. Information about it is found here http://code.google.com/p/lz4/
. Additionally, there exists a JNI library for it (and snappy, for ACCUMULO-139 ) at https://github.com/decster/jnicompressions
. I did not find the license for that, but it's a potential option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message