flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Kryo StackOverflowError
Date Tue, 12 Apr 2016 15:18:56 GMT
+1

On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <rmetzger@apache.org> wrote:

> Good catch Till!
>
> I just checked it with the Mahout source code and the issues is gone with
> reference tracking enabled.
>
> I would just re-enable it again in Flink.
>
> On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <trohrmann@apache.org>
> wrote:
>
> > Hey guys,
> >
> > I have a suspicion which could be the culprit: Could change the line
> > KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
> > still remains? We deactivated the reference tracking and now Kryo
> shouldn’t
> > be able to resolve cyclic references properly.
> >
> > Cheers,
> > Till
> > ​
> >
> > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> todd.lisonbee@intel.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I also got this error message when I had private inner classes:
> > >
> > > public class A {
> > >     private class B {
> > >     }
> > > }
> > >
> > > I was able to fix by making the inner classes public static:
> > >
> > > public class A {
> > >     public static class B {
> > >     }
> > > }
> > >
> > > When I was trying to debug it seemed this error message can be caused
> by
> > > several different things.
> > >
> > > Thanks,
> > >
> > > Todd
> > >
> > >
> > > -----Original Message-----
> > > From: Hilmi Yildirim [mailto:Hilmi.Yildirim@dfki.de]
> > > Sent: Sunday, April 10, 2016 11:36 AM
> > > To: dev@flink.apache.org
> > > Subject: Re: Kryo StackOverflowError
> > >
> > > Hi,
> > > I also had this problem and solved it.
> > >
> > > In my case I had multiple objects which are created via anonymous
> > classes.
> > > When I broadcasted these objects, the serializer tried to serialize the
> > > objects and for that it tried to serialize the anonymous classes. This
> > > caused the problem.
> > >
> > > For example,
> > >
> > > class A{
> > >
> > >   def createObjects() : Array[Object]{
> > >             objects
> > >          for{
> > >              object = new Class{
> > >              ...
> > >              }
> > >              objects.add(object)
> > >          }
> > >          return objects
> > >      }
> > > }
> > >
> > > It tried to serialize "new Class". For that it tried to serialize the
> > > method createObjects(). And then it tried to serialize class A. To
> > > serialize class A it tried to serialize the method createObjects. Or
> > > something like that, I do not remember the details. This caused the
> > > recursion.
> > >
> > > BR,
> > > Hilmi
> > >
> > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > Hi!
> > > >
> > > > Is it possible that some datatype has a recursive structure
> > nonetheless?
> > > > Something like a linked list or so, which would create a large object
> > > graph?
> > > >
> > > > There seems to be a large object graph that the Kryo serializer
> > > traverses,
> > > > which causes the StackOverflowError.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <ap.dev@outlook.com>
> > > wrote:
> > > >
> > > >> Hi Stephan,
> > > >>
> > > >> thanks for answering.
> > > >>
> > > >> This not from a recursive object. (it is used in a recursive method
> in
> > > the
> > > >> test that is throwing this error, but the the depth is only 2 and
> > there
> > > are
> > > >> no other Flink DataSet operations before execution is triggered so
> it
> > is
> > > >> trivial.)
> > > >>
> > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > >>
> > > >>
> > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > >>
> > > >> The Error begins at line 178 of the "Output" file.
> > > >>
> > > >> Thanks
> > > >>
> > > >> ________________________________________
> > > >> From: ewenstephan@gmail.com <ewenstephan@gmail.com> on behalf
of
> > > Stephan
> > > >> Ewen <sewen@apache.org>
> > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > >> To: dev@flink.apache.org
> > > >> Subject: Re: Kryo StackOverflowError
> > > >>
> > > >> Hi!
> > > >>
> > > >> Sorry, I don't fully understand he diagnosis.
> > > >> You say that this stack overflow is not from a recursive/object
> type?
> > > >>
> > > >> Long graphs of operations in Flink usually do not cause
> > > >> StackOverflowExceptions, because not the whole graph is recursively
> > > >> processed.
> > > >>
> > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > >>
> > > >> Greetings,
> > > >> Stephan
> > > >>
> > > >>
> > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <ap.dev@outlook.com
> >
> > > >> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>>
> > > >>> I am working on a matrix multiplication operation for Mahout Flink
> > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > >>>
> > > >>>
> > > >>> When testing, I am getting the following error:
> > > >>>
> > > >>>
> > > >>> {...}
> > > >>>
> > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > >>> -> FlatMap (FlatMap at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > >>> switched to CANCELED
> > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > >>> -> GroupCombine (GroupCombine at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > >>> -> Combine (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > >>> switched to FAILED
> > > >>> java.lang.StackOverflowError
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>> {...}
> > > >>>
> > > >>>
> > > >>> I've seen similar issues on the dev@flink list (and other places),
> > > but I
> > > >>> believe that they were from recursive calls and objects which
> pointed
> > > >> back
> > > >>> to themselves somehow.
> > > >>>
> > > >>>
> > > >>> This is a relatively straightforward method, it just has several
> > Flink
> > > >>> operations before execution is triggered.   If I remove some
> > > operations,
> > > >>> eg. a reduce, i can get the method to complete on a simple test
> > however
> > > >> the
> > > >>> it will then, of course be numerically incorrect.
> > > >>>
> > > >>>
> > > >>> I am wondering if there is any workaround for this type of problem?
> > > >>>
> > > >>>
> > > >>> Thank You,
> > > >>>
> > > >>>
> > > >>> Andy
> > > >>>
> > >
> > >
> > > --
> > > ==================================================================
> > > Hilmi Yildirim, M.Sc.
> > > Researcher
> > >
> > > DFKI GmbH
> > > Intelligente Analytik für Massendaten
> > > DFKI Projektbüro Berlin
> > > Alt-Moabit 91c
> > > D-10559 Berlin
> > > Phone: +49 30 23895 1814
> > >
> > > E-Mail: Hilmi.Yildirim@dfki.de
> > >
> > > -------------------------------------------------------------
> > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > >
> > > Geschaeftsfuehrung:
> > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > Dr. Walter Olthoff
> > >
> > > Vorsitzender des Aufsichtsrats:
> > > Prof. Dr. h.c. Hans A. Aukes
> > >
> > > Amtsgericht Kaiserslautern, HRB 2313
> > > -------------------------------------------------------------
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message