avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Catalin Alexandru Zamfir (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1093) DataFileWriter, appendEncoded causes AvroRuntimeException when read back
Date Wed, 16 May 2012 14:37:03 GMT
Catalin Alexandru Zamfir created AVRO-1093:

             Summary: DataFileWriter, appendEncoded causes AvroRuntimeException when read
                 Key: AVRO-1093
                 URL: https://issues.apache.org/jira/browse/AVRO-1093
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.6.3
            Reporter: Catalin Alexandru Zamfir

We're doing this:
// Check
		if (!(objRecordsBuffer
		.containsKey (objShardPath))) {
			// Set
			objRecordsBuffer.put (objShardPath,
			new ByteBufferOutputStream ());

		// Set
		Encoder objEncoder =  EncoderFactory.get ()
		.binaryEncoder (objRecordsBuffer
		.get (objShardPath), null);

		// Write
		objGenericDatumWriter.write (objRecordConstructor.build (), objEncoder);
		objEncoder.flush ();

// For
				for (ByteBuffer objRecord : objRecordsBuffer
				.get (objKey).getBufferList ()) {
					// Append
					objRecordWriter.appendEncoded (objRecord);

				// Erase
				objRecordWriter.flush ();
				objRecordWriter.close ();

It writes the data to HDFS. Reading it back outputs the follosing exception:
Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially,
the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
        at net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113)
        at net.RnD.Hadoop.App.read1BAvros(App.java:131)
        at net.RnD.Hadoop.App.executeCode(App.java:534)
        at net.RnD.Hadoop.App.main(App.java:453)
        ... 5 more
Caused by: java.io.IOException: Block read partially, the data may be corrupt
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
        ... 9 more

The objRecordWriter is an instance of DataFileWriter.create or DataFileWriter.appendto (SeekableInput).
In relation to AVRO-1090 ticket.

Instead of having big "hashmaps" in memory, we've decided to serialize the data in "byte buffers"
in memory. Because it's faster. Using "appendEncoded" although seems to write something to
HDFS, reading the data back, exposes this error.

Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but could not figure
out if it's our job to add a sync marker, or does appendEncoded does that for us.

Must the "ByteBuffer" we give, be the length of one exact record?
Examples and documentation on this method is welcomed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message