avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Francke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1302) Files written via Python and Avro 1.7.4 on Windows can't be read using Java program
Date Wed, 15 Oct 2014 17:26:35 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Francke updated AVRO-1302:
-------------------------------
    Attachment: AVRO-1302.1.patch

It took a day of debugging but we found the solution for our problem.

It is caused by recklessly copying and pasting the example code from the Documentation. This
code opens files using the {{w}} and {{r}} modes respectively. These modes replace newline
characters with their platform-specific representations. On Windows {{\n}} is being replaced
by {{\r\n}}. That obviously corrupts the data.

I've attached a patch that fixes the documentation to always use binary mode ({{wb}} and {{rb}})
and added a note that explains the importance of these.

> Files written via Python and Avro 1.7.4 on Windows can't be read using Java program
> -----------------------------------------------------------------------------------
>
>                 Key: AVRO-1302
>                 URL: https://issues.apache.org/jira/browse/AVRO-1302
>             Project: Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.7.4
>            Reporter: Christopher Conner
>            Priority: Minor
>         Attachments: AVRO-1302.1.patch
>
>
> I'm not sure if this is a Python issue, Avro issue or Windows issue.  However, if create
an Avro file on Windows using Python 2.7.4 and Avro 1.7.4.  Then try to read it with a java
program, it fails with:
> Successfully opened the Python avro file now I'm going to attempt to read from it
> Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException:
Invalid sync!
> 	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
> 	at JavaPythonAvroExample.main(JavaPythonAvroExample.java:27)
> Caused by: java.io.IOException: Invalid sync!
> 	at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:293)
> 	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:198)
> 	... 1 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message