flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Palumbo <ap....@outlook.com>
Subject Re: Kryo StackOverflowError
Date Sun, 10 Apr 2016 16:24:44 GMT
Hi Stephan,

thanks for answering.

This not from a recursive object. (it is used in a recursive method in the test that is throwing
this error, but the the depth is only 2 and there are no other Flink DataSet operations before
execution is triggered so it is trivial.)

Gere is a Gist of the code, and the full output and stack trace:

https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419

The Error begins at line 178 of the "Output" file.

Thanks  

________________________________________
From: ewenstephan@gmail.com <ewenstephan@gmail.com> on behalf of Stephan Ewen <sewen@apache.org>
Sent: Sunday, April 10, 2016 9:39 AM
To: dev@flink.apache.org
Subject: Re: Kryo StackOverflowError

Hi!

Sorry, I don't fully understand he diagnosis.
You say that this stack overflow is not from a recursive/object type?

Long graphs of operations in Flink usually do not cause
StackOverflowExceptions, because not the whole graph is recursively
processed.

Can you paste the entire Stack Trace (for example to a gist)?

Greetings,
Stephan


On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <ap.dev@outlook.com> wrote:

> Hi all,
>
>
> I am working on a matrix multiplication operation for Mahout Flink
> Bindings that uses quite a few chained Flink Dataset operations,
>
>
> When testing, I am getting the following error:
>
>
> {...}
>
> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> -> FlatMap (FlatMap at
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> switched to CANCELED
> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> -> GroupCombine (GroupCombine at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> -> Combine (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> switched to FAILED
> java.lang.StackOverflowError
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> {...}
>
>
> I've seen similar issues on the dev@flink list (and other places), but I
> believe that they were from recursive calls and objects which pointed back
> to themselves somehow.
>
>
> This is a relatively straightforward method, it just has several Flink
> operations before execution is triggered.   If I remove some operations,
> eg. a reduce, i can get the method to complete on a simple test however the
> it will then, of course be numerically incorrect.
>
>
> I am wondering if there is any workaround for this type of problem?
>
>
> Thank You,
>
>
> Andy
>

Mime
View raw message