avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ey-chih chow (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-792) map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
Date Fri, 20 May 2011 17:27:47 GMT

     [ https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ey-chih chow updated AVRO-792:
------------------------------

    Attachment: part-00001.avro
                part-00000.avro

We still don't have enough time to create a test case to reproduce the problem.  What I have
done is, given our MR job, mentioned above, I made it become a map-only job with the argument

-D mapred.reduce.tasks=0 to generate data after the mapper.  Attached are the data files generated.
 I think the problem arises when the reducer de-serializes the data.  Can someone help me
to create a test case to simulate the de-serialization process of the reducer by, perhaps,
writing a MR job with an identity mapper against the data files attached?  Thanks.

> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
>                 Key: AVRO-792
>                 URL: https://issues.apache.org/jira/browse/AVRO-792
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Mac with VMWare running Linux training-vm-Ubuntu
>            Reporter: ey-chih chow
>            Priority: Blocker
>             Fix For: 1.5.2
>
>         Attachments: AVRO-792-2.patch, AVRO-792-3.patch, AVRO-792.patch, part-00000.avro,
part-00000.avro, part-00001.avro, part-00001.avro
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken with avro
1.5.  The M/R job with avro 1.5 worked fine under our debugging environment, but broken when
we moved to a real cluster.  At one instance f testing, the job had 23 reducers.  Four of
them succeeded and the rest failed because of the ArrayIndexOutOfBoundsException generated.
 Here are two instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
> 	at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> 	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> 	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> 	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> 	at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> 	at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
> 	at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
> 	at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> 	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> 	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> 	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> 	at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> 	at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
> 	at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> collector,
Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord>
collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between avro 1.4
and 1.5, but could not figure out why the exceptions were generated. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message