avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: Avro Read with sync() {java.io.IOException: Invalid sync}
Date Tue, 24 Dec 2013 04:49:47 GMT
Hi Doug,
You want me to raise a bug against Avro or Hadoop-Core. My guess is avro
Regards,
Deepak


On Tue, Dec 24, 2013 at 12:10 AM, Doug Cutting <cutting@apache.org> wrote:

> This sounds like a bug.
>
> I wonder if it is similar to a related bug in Hadoop?
>
> https://issues.apache.org/jira/browse/HADOOP-9307
>
> If so, please file an issue in Jira.
>
> Doug
>
> On Sat, Dec 21, 2013 at 4:35 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
> wrote:
> > Hello,
> > I have a 340 MB avro data file that contains records sorted and
> identified
> > by unique id (duplicate records exists). At the beginning of every unique
> > record a synchronization point is created with DataFileWriter.sync(). (I
> > cannot or do not want to save the sync points and i do not want to use
> > SortedKeyValueFile as output format for M/R job)
> >
> > There are at-least 25k synchronization points in a 340 MB file.
> >
> > Ex:
> > Marker1_RecordA1_RecordA2_RecordA3_Marker2_RecordB1_RecordB2
> >
> >
> > As records are sorted, for efficient retrieval, binary search is
> performed
> > using the attached code.
> >
> > Most of the times the search is successful, at times the code throws the
> > following exception
> > ------
> > org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
> at
> > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210
> > ------
> >
> >
> >
> > Questions
> > 1) Is it ok to have 25k sycn points for 300 MB file ? Does it cost in
> > performance while reading ?
> > 2) I note down the position that was used to invoke
> fileReader.sync(mid);.
> > If i catch AvroRuntimeException, close and open the file and sync(mid) i
> do
> > not see exception. Why should Avro throw exception before and not later ?
> > 3) Is there a limit on number of times sync() is invoked ?
> > 4) When sync(position) is invoked, are any 0 >= position <= file.size()
> > valid ? If yes why do i see AvroRuntimeException (#2) ?
> >
> > Regards,
> > Deepak
> >
>



-- 
Deepak

Mime
View raw message