avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Pouttu-Clarke <Matt.Pouttu-Cla...@icrossing.com>
Subject Re: Invalid sync error when reading Avro file (Amazon EMR Hadoop)
Date Thu, 26 May 2011 14:46:59 GMT
Hi Scott,

Thanks for the response.  It turns out that in part of our code we were
concatenating smaller files into larger files using i/o streams.  This used
to work fine when the files were JSON text files.  However, we learned that
hard way that with Avro you cannot concatenate files in the traditional
sense.  Unless you parse the inputs and merge them using the Avro APIs you
get the ŒInvalid sync¹ error when attempting to read the cat file.
Obviously in retrospect this has to do with the JSON schema at the beginning
of each file not being valid in the middle of the concatenated file.

The reason why we didn¹t see this on our Apache Hadoop dev cluster was the
data size was smaller and the concatenation was 1-to-1.

Maybe a better error message would have led us to this conclusion sooner?
Other than that it¹s not really Avro¹s problem.

-Matt

On 5/25/11 4:40 PM, "Scott Carey" <scott@richrelevance.com> wrote:

> The svn change you note is from AVRO-160.  Avro's file format changed between
> Avro 1.2 and 1.3.
> Recent versions (Avro 1.5.x and perhaps 1.4.1) have a file reader class for
> Avro 1.2 that is separate in case old format files need to be read.
> 
> We weren't aware of anyone using the 1.2 format at the time we changed (see
> AVRO-160).
> 
> I'm not sure your error below is due to that change however.  Does the error
> below occur before any records are retrieved? or part-way through after some
> have been accessed?
> 
>   
> 
> On 5/25/11 2:34 PM, "Matt Pouttu-Clarke" <Matt.Pouttu-Clarke@icrossing.com>
> wrote:
> 
>> Getting this error when reading an Avro file on Amazon EMR Hadoop.  Does not
>> occur on any recent Apache Hadoop build.
>> 
>> Exception org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid
>> sync!
>> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
>>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:176)
>>     at Abc.readAvroFile(Abc.java:28)
>>     at Abc.main(Abc.java:65)
>> Caused by: java.io.IOException: Invalid sync!
>>     at 
>> org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:258)
>>     at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:164)
>>     ... 2 more
>> 
>> Source code that throws the Invalid sync! exception indicates low level IO
>> problem:
>> {code}
>> 244      DataBlock nextRawBlock(DataBlock reuse) throws IOException {
>> 245        if (!hasNextBlock()) {
>> 246          throw new NoSuchElementException();
>> 247        }
>> 248        if (reuse == null || reuse.data.length < (int) blockSize) {
>> 249          reuse = new DataBlock(blockRemaining, (int) blockSize);
>> 250        } else {
>> 251          reuse.numEntries = blockRemaining;
>> 252          reuse.blockSize = (int)blockSize;
>> 253        }
>> 254        // throws if it can't read the size requested
>> 255        vin.readFixed(reuse.data, 0, reuse.blockSize);
>> 256        vin.readFixed(syncBuffer);
>> 257        if (!Arrays.equals(syncBuffer, sync))
>> 258          throw new IOException("Invalid sync!");
>> 259        availableBlock = false;
>> 260        return reuse;
>> 261      }
>> {code}
>> 
>> Looks like this commit from Doug Cutting removed those error messages:
>> http://www.mail-archive.com/avro-commits@hadoop.apache.org/msg00218.html
>> 
>> Anyone have any clue as to what could cause these errors?
>> 
>> Thanks,
>> Matt
>> 
>> 
>> iCrossing Privileged and Confidential Information
>> This email message is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information of iCrossing. Any
>> unauthorized review, use, disclosure or distribution is prohibited. If you
>> are not the intended recipient, please contact the sender by reply email and
>> destroy all copies of the original message.
>> 


Mime
View raw message