avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Updated: (AVRO-380) Avro Container File format change: add block size to block descriptor
Date Fri, 29 Jan 2010 08:05:34 GMT

     [ https://issues.apache.org/jira/browse/AVRO-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Scott Carey updated AVRO-380:
-----------------------------

    Attachment: AVRO-380.patch

Updated patch:

* Block size and block record count are longs in both the format and the code.   This reader
implementation will throw an exception if a block size is larger than Integer.MAX_VALUE.
* When a block is read from the underlying stream, the reader checks that the number of bytes
read is equal to the block size (that it is not truncated).
* When a block is finished (blockCount records read), the reader checks that all bytes have
been read.  This is done by forcing an EOFException -- which is ugly and I plan to change
at a later time as an optimization along with other planned changes.
* DeflateCodec now writes and reads RFC-1951 'raw' deflate, in line with the documentation.


> Avro Container File format change:  add block size to block descriptor
> ----------------------------------------------------------------------
>
>                 Key: AVRO-380
>                 URL: https://issues.apache.org/jira/browse/AVRO-380
>             Project: Avro
>          Issue Type: Improvement
>          Components: doc, java, spec
>    Affects Versions: 1.3.0
>            Reporter: Scott Carey
>             Fix For: 1.3.0
>
>         Attachments: AVRO-380.patch, AVRO-380.patch
>
>
> The new file format in AVRO-160 limits a few use cases that I have found to be important.
> A block currently contains a count of the number of records, the block data, and a sync
marker.  
> This change would add the block size, in bytes, along side the number of records.   
> This allows efficient access to a block's data without the need to decode the data into
individual Datums, which is useful for various use cases.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message