avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1243) Avro support for all compression codecs
Date Fri, 08 Feb 2013 18:21:13 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574687#comment-13574687
] 

Ted Malaska commented on AVRO-1243:
-----------------------------------

Interesting I'm trying to find a good maven dependency for BZip2 and I found a couple options
but none prefect.  I would love feedback on which would be the right fit for Avro.

1. In hadoop they re-implement BZip2 in the package org.apache.hadoop.io.compress.bzip2. 
So one option is to re-implement it again in avro.
2. There are implementation for org.apache.tools.bzip2 that are included in other maven repositories
like the org.apache.ant.
3. There is an implementation of bzip2 in maven repository groupId:org.apache.commons artifactId:commons-compress.
 

As for right now I'm going to try a first cut with option 3, because it means we don't have
to re-implement BZip2 in Avro and because adding a dependency on ant just seems odd.

The good thing is the new implementation is a lot less complex then my first reflection version
so which ever BZip2 direction we go it will be easy to switch it if needed. 

                
> Avro support for all compression codecs
> ---------------------------------------
>
>                 Key: AVRO-1243
>                 URL: https://issues.apache.org/jira/browse/AVRO-1243
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Ted Malaska
>            Priority: Minor
>         Attachments: AVRO-1243.not-ready.1.patch, AVRO-1243.not-ready.patch
>
>
> I may be reading this wrong but at this time org.apache.avro.file.CodecFactory only supports
null, deflate, and snappy compression codecs.
> I would like to change the fromString method to use Class.forName(codec).newInstance();
after the codec was not found in the REGISTERED map but before the AvroRuntimeException is
thrown. 
> Here are some of my supporting thoughts
> 1. This should not interduce much slowness because it will only be called initialize.
> 2. This will allow for support for GZip, BZip2, and LZO with out adding more dependances
to the maven pom file.
> 3. This will allow for a future Jiri I would like to do that would allow AvroOutputFormat
to be able to use the following configs: mapred.output.compress and mapred.output.compression.codec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message