avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-380) Avro Container File format change: add block size to block descriptor
Date Fri, 29 Jan 2010 08:05:34 GMT

     [ https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Scott Carey updated AVRO-380:

    Attachment: AVRO-380.patch

Updated patch:

* Block size and block record count are longs in both the format and the code.   This reader
implementation will throw an exception if a block size is larger than Integer.MAX_VALUE.
* When a block is read from the underlying stream, the reader checks that the number of bytes
read is equal to the block size (that it is not truncated).
* When a block is finished (blockCount records read), the reader checks that all bytes have
been read.  This is done by forcing an EOFException -- which is ugly and I plan to change
at a later time as an optimization along with other planned changes.
* DeflateCodec now writes and reads RFC-1951 'raw' deflate, in line with the documentation.

> Avro Container File format change:  add block size to block descriptor
> ----------------------------------------------------------------------
>                 Key: AVRO-380
>                 URL: https://issues.apache.org/jira/browse/AVRO-380
>             Project: Avro
>          Issue Type: Improvement
>          Components: doc, java, spec
>    Affects Versions: 1.3.0
>            Reporter: Scott Carey
>             Fix For: 1.3.0
>         Attachments: AVRO-380.patch, AVRO-380.patch
> The new file format in AVRO-160 limits a few use cases that I have found to be important.
> A block currently contains a count of the number of records, the block data, and a sync
> This change would add the block size, in bytes, along side the number of records.   
> This allows efficient access to a block's data without the need to decode the data into
individual Datums, which is useful for various use cases.  

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message