spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <>
Subject Re: NullPointerException on reading checkpoint files
Date Tue, 23 Sep 2014 20:56:03 GMT
This is actually a very tricky as their two pretty big challenges that need
to be solved.
(i) Checkpointing for broadcast variables: Unlike RDDs, broadcasts variable
dont have checkpointing support (that is you cannot write the content of a
 broadcast variable to HDFS and recover it automatically when needed).
(ii) Remembering the checkpoint info of broacast vars used in every batch,
and recovering those vars from the checkpoint info. And exposing this in
the API such that it can be used such that all the checkpointing/recovering
can be done by Spark Streaming seamlessly without user's knowledge.

I have some thoughts on it, but nothing concrete yet. The first, that is,
broadcast checkpointing, should be straight forward, and may be rewarding
outside streaming.


On Tue, Sep 23, 2014 at 4:22 PM, RodrigoB <>

> Hi TD,
> This is actually an important requirement (recovery of shared variables)
> for
> us as we need to spread some referential data across the Spark nodes on
> application startup. I just bumped into this issue on Spark version 1.0.1.
> I
> assume the latest one also doesn't include this capability. Are there any
> plans to do so.
> If not could you give me your opinion on how difficult would it be to
> implement this? If it's nothing too complex I could consider contributing
> on
> that level.
> BTW, regarding recovery I have posted a topic on which I would very much
> appreciate your comments on
> tnks,
> Rod
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message