avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Douglas Creager (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-818) C data file writer produces corrupt blocks
Date Thu, 12 May 2011 13:49:47 GMT

     [ https://issues.apache.org/jira/browse/AVRO-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Douglas Creager updated AVRO-818:

    Attachment: quickstop.c

Here's a modified examples/quickstop.c that shows the error — it adds enough records to
the file to spill over into a new block.  The resulting file is unreadable by any of the Avro

> C data file writer produces corrupt blocks
> ------------------------------------------
>                 Key: AVRO-818
>                 URL: https://issues.apache.org/jira/browse/AVRO-818
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.5.1
>            Reporter: Douglas Creager
>            Assignee: Douglas Creager
>         Attachments: quickstop.c
> The data file writer in the C library can produce corrupt blocks.  The logic in datafile.c
is that we have a fixed-buffer in-memory avro_writer_t instance.  When you append records
to the data file, they go into this memory buffer.  If we get an error serializing into the
memory buffer, it's presumably because we've filled it, so we write out the memory buffer's
contents as a new block in the file, clear the buffer, and try again.
> The problem is that the failed serialization into the memory buffer isn't atomic; some
of the serialization will have made it into the buffer before we discover that there's not
enough room.  And this incomplete record will then make it into the file.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message