avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1873) avro gem doesn't compatible with other languages with snappy compression
Date Sat, 10 Sep 2016 23:04:20 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15480604#comment-15480604
] 

Ryan Blue commented on AVRO-1873:
---------------------------------

I wrote the same content from Java and from Ruby and hexdumped the result. The problem was
that the last 4 bytes were missing from the ruby payload, but the rest of the Snappy-encoded
data looked identical. From looking at [Java's SnappyCodec|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/SnappyCodec.java],
it looks like those last 4 bytes are a CRC32 checksum. Adding the checksum (using Zlib.crc32)
fixed compatibility and made it so Avro blocks written by Java and Ruby are identical.

For the read path, I implemented the check but the code doesn't throw an error if the checksum
doesn't match. Instead, it assumes that it is reading an older Ruby file and decompresses
the entire incoming buffer and passes the result along. I don't think there's a way to both
validate the checksum and detect old files, so this seems reasonable to me.

> avro gem doesn't compatible with other languages with snappy compression
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1873
>                 URL: https://issues.apache.org/jira/browse/AVRO-1873
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.8.1
>         Environment: CentOS 6.8 64bit, Snappy 1.1.0, Python 3.5, Ruby 2.2.3
>            Reporter: Pumsuk Cho
>            Priority: Blocker
>             Fix For: 1.8.2
>
>
> I've tested avro gem today, then found some weird result.
> With python library like "fastavro", generated an avro file snappy compressed. This file
works fine with avro-tools-1.8.1.jar.
> java -jar avro-tools-1.8.1.jar tojson testing.avro returns what I expected.
> But NOT compatible with ruby using avro gem returns "Invalid Input" message. And snappy
compressed avro file made with avro gem doesn't work with avro-tools nor in python with avro-python3
and fastavro.
> my ruby codes are below:
> schema = Avro::Schema.paese(File.open('test.avsc', 'r').read)
> avrofile = File.open('test.avro', 'wb')
> writer = Avro::IO::DatumWriter.new(schema)
> datawriter = Avro::DataFile::Writer.new file, writer, schema, 'snappy'
> datawriter<< {"title" => "Avro", "author" => "Apache Foundation"}
> datawriter.close



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message