avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-792) map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
Date Thu, 07 Apr 2011 18:34:06 GMT

    [ https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017076#comment-13017076
] 

Doug Cutting commented on AVRO-792:
-----------------------------------

With this patch the value of applyAliases() is no longer cached.  Some of the cost of applyAliases()
is incurred whenever the reader and writer schemas are not identical, and I'm pleased to see
that this (a walk of the schema looking for aliases and the creation of tables to hold aliases)
doesn't slow things.  But in the case an alias exists, applyAliases must create a copy of
the schema, something Perf.java doesn't currently test.  Is it worth adding a test for this?

> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
>                 Key: AVRO-792
>                 URL: https://issues.apache.org/jira/browse/AVRO-792
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>         Environment: Mac with VMWare running Linux training-vm-Ubuntu
>            Reporter: ey-chih chow
>            Assignee: Thiruvalluvan M. G.
>            Priority: Blocker
>             Fix For: 1.5.1
>
>         Attachments: AVRO-792-2.patch, AVRO-792.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken with avro
1.5.  The M/R job with avro 1.5 worked fine under our debugging environment, but broken when
we moved to a real cluster.  At one instance f testing, the job had 23 reducers.  Four of
them succeeded and the rest failed because of the ArrayIndexOutOfBoundsException generated.
 Here are two instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
> 	at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> 	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> 	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> 	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> 	at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> 	at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
> 	at com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
> 	at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> 	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> 	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> 	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> 	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> 	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> 	at org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> 	at org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> 	at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
> 	at com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> 	at org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> collector,
Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values, AvroCollector<GenericRecord>
collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between avro 1.4
and 1.5, but could not figure out why the exceptions were generated. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message