avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-134) Mismatch between the spec and implementation of metadata blocks in files
Date Tue, 06 Oct 2009 19:23:31 GMT

    [ https://issues.apache.org/jira/browse/AVRO-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762748#action_12762748

Doug Cutting commented on AVRO-134:

> For Avro's purposes, zlib or raw deflate seem most appropriate to me.

I agree.  I don't have a strong preference.  The 6-byte overhead of zlib seems acceptable,
but don't see that it provides much benefit.  We generally expect the filesystem to provide
checksums, so I'd argue against adding one if we use raw deflate.  If folks are more comfortable
having a checksum, then we should perhaps use zlib.

> how do we control the compression level?

This does not affect the format, so doesn't need to be part of the Avro specification, right?
 Rather it can be a part of the API, which varies by programming language.  So, in Java, we
might add a method, DataFileWriter#setCodecCompressionLevel(int), that must be called before
the first entry is appended to a file.  Does that sound right?

> Mismatch between the spec and implementation of metadata blocks in files
> ------------------------------------------------------------------------
>                 Key: AVRO-134
>                 URL: https://issues.apache.org/jira/browse/AVRO-134
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Thiruvalluvan M. G.
>         Attachments: AVRO-134.patch
> The spec says there are three keys in metadata blocks - schema, count and _codec_. But
the code in DataFileWriter adds schema, count and _sync_. The sync field is used by the DataFileReader.
We need to do the following:
>    - Add the key sync in the specification.
>    - Either drop the key codec in the specification or add code to support codec in DataFileReader/DataFileWriter.
If we decide to have codec, we need to also publish  in the spec the list of supported codecs
with their names to use in the metadata block.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message