avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1387) Avro container file format update to write checksums for individual record
Date Fri, 18 Oct 2013 17:54:46 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799343#comment-13799343
] 

Doug Cutting commented on AVRO-1387:
------------------------------------

If you're willing to write a new block per record, then you might consider just using the
Snappy codec, which includes a checksum for each block.  Alternately, you could define a meta-codec,
that wraps other codecs in checksums, e.g., we might have codecs like deflate+md5 or null+crc32.
 The point being that we already have a pluggable per-block extension point in codecs, and
one of the standard implementations already includes checksums.

> Avro container file format update to write checksums for individual record
> --------------------------------------------------------------------------
>
>                 Key: AVRO-1387
>                 URL: https://issues.apache.org/jira/browse/AVRO-1387
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>
> We are considering changes in Flume's file channel to use Avro, one of the requirements
is that each event (which maps to one avro record) be checksummed so we know if the data is
corrupt. 
> We'd probably have to add a new version for this, since this will change the data format
on disk. I can start working on a Java version if there are no objections



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message