spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Stackoverflow after a small change by me
Date Tue, 24 Dec 2013 19:33:14 GMT
Hello Dachuan,

RDDs generated by StateDStream are checkpointed because the tree of RDD
dependencies (i.e. the RDD lineage) can grow indefinitely as each state RDD
depends on the state RDD from the previous batch of data. Checkpointing
save an RDD to HDFS to cuts of all ties to its parent RDDs (i.e. truncates
the lineage). If you do not periodically checkpoint of the state RDDs,
these really large lineages can lead to all sorts of problems. The
"mustCheckpoint" field ensures that state RDDs are automatically
checkpointed with some periodicity even if the user does not explicitly
specify one. Setting mustCheckpoint to false disables this automatic
checkpointing. I think that is leading to really large lineages, and
serializing the RDD with its lineage is causing the stack to overflow.

On that note, what are you trying to achieve by setting mustCheckpoint =
false? Maybe there is another way of achieving what you are trying to
achieve.

TD


On Tue, Dec 24, 2013 at 9:05 AM, Dachuan Huang
<huangda@cse.ohio-state.edu>wrote:

> Hello, developers,
>
> Just out of curiosity, I have changed the "mustCheckpoint" in
> StateDStream.scala to "false" by default. And run the
> StatefulNetworkWordCount.scala example.
>
> My input is a 3MB/s speed Serversocket.
>
> It reports the following error after some time, the exception trace didn't
> say anything about the spark code, so I don't know how to nail down the
> root cause, can anybody help me with this? thanks.
>
> Exception in thread "DAGScheduler" java.lang.StackOverflowError
> at
> java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1233)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1532)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at scala.collection.immutable.$colon$colon.writeObject(List.scala:430)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
> at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message