avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvalluvan M. G. (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1058) invalid int encoding with binary format
Date Mon, 21 May 2012 08:11:41 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280009#comment-13280009
] 

Thiruvalluvan M. G. commented on AVRO-1058:
-------------------------------------------

I've created a separate JIRA AVRO-1097 for the first problem.

The second issue is because of certain record schemas that have no fields. Since Avro does
not write anything for empty records, there is nothing to read. So even if the stream is at
its end, no EOFException is thrown. In avro there is no way to distinguish streams with zero
empty records from those having non-zero empty records. For empty records, such information
should come from "out-of-band". I don't think it is a bug.

The third problem seem to have been addressed already. All strings and char-sequences are
converted into Utf8 for comparison in GenericData.compare().

                
> invalid int encoding with binary format
> ---------------------------------------
>
>                 Key: AVRO-1058
>                 URL: https://issues.apache.org/jira/browse/AVRO-1058
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.6.2, 1.6.3, 1.7.0
>            Reporter: wolfgang hoschek
>             Fix For: 1.7.0
>
>         Attachments: TestRandomRecord.java
>
>
> The java binary format sometimes generates an "invalid int encoding" exception and fails
to roundtrip a record even though the json format roundtrips the same record just fine.
> In addition, there is a separate bug in that both binary and JSON format sometimes lead
to an infinite loop when read() always returns null and never throws EOFException to indicate
end-of-stream. This causes an OutOfMemoryError in the test driver because it forever adds
null to a list of records.
> The attached test case java file demonstrates the problems. It walks all *.avsc and *.avpr
files in the code base, generates random records based on those schemas, roundtrips the records,
and then compares records pre and post roundtrip. To see it fail comment out portions of the
following snippet:
> if (roundtripType == RoundtripType.BINARY_AVRO && schemaFile.getName().equals("weather.avsc")
&& i >= 350) {
> 	continue; // FIXME tmp work-around for avro bug (invalid int encoding on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && schemaFile.getName().equals("Json.avsc")
&& i >= 1) {
> 	continue; // FIXME tmp work-around for avro bug (invalid int encoding on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && schemaFile.getName().equals("WordCount.avsc")
&& i >= 2) {
> 	continue; // FIXME tmp work-around for avro bug (invalid int encoding on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && schemaFile.getName().equals("mr_events.avpr")
&& i >= 0) {
> 	continue; // FIXME tmp work-around for avro bug (invalid int encoding on large string)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avsc")) {
> 	continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avpr")) {
> 	continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("import.avpr")) {
> 	continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("namespaces.avpr")) {
> 	continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> Finally, there is a third separate issue, which is described in the javadoc for test
method fixup():
> 	/**
> 	 * You can trigger Record.equals() failures by modifying RandomData to spit
> 	 * out Strings rather than Utf8 objects.
> 	 * 
> 	 * This hack replaces all occurances of Utf8 objects with String objects in
> 	 * the given avro record tree. This is sometimes necessary to make
> 	 * Record.equals() work correctly because Avro deserialization deserializes
> 	 * String objects as Utf8 objects, and String.equals(Utf8) returns false
> 	 * even if Utf8.equals(String) would return true.
> 	 * 
> 	 * In this particular test scenario this fixup hack might not be necessary
> 	 * because the RandomData class always generates Utf8 instead of Strings.
> 	 * 
> 	 * Nonetheless, perhaps Record.equals() and descendants including Map
> 	 * equality, etc, should treat any two pairs of String and Utf8 as equal if
> 	 * string.equals(utf8.toString())). Perhaps Avro internals should arrange to
> 	 * have the utf8 object always on the left hand side of equality
> 	 * comparisons, like utf8.equals(obj).
> 	 */
> 	private void fixup(Object obj) { ... }
> To summarize, there are really three separate issues here. I'm submitting them all in
one bug report. Feel free to open separate JIRA issues if that's deemed more appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message