spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mridul Muralidharan <mri...@gmail.com>
Subject Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing
Date Sat, 17 May 2014 08:36:53 GMT
We don't have 3x replication in spark :-)
And if we use replicated storagelevel, while decreasing odds of failure, it
does not eliminate it (since we are not doing a great job with replication
anyway from fault tolerance point of view).
Also it does take a nontrivial performance hit with replicated levels.

Regards,
Mridul
 On 17-May-2014 8:16 am, "Xiangrui Meng" <mengxr@gmail.com> wrote:

> With 3x replication, we should be able to achieve fault tolerance.
> This checkPointed RDD can be cleared if we have another in-memory
> checkPointed RDD down the line. It can avoid hitting disk if we have
> enough memory to use. We need to investigate more to find a good
> solution. -Xiangrui
>
> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
> > Effectively this is persist without fault tolerance.
> > Failure of any node means complete lack of fault tolerance.
> > I would be very skeptical of truncating lineage if it is not reliable.
> >  On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <jira@apache.org> wrote:
> >
> >> Xiangrui Meng created SPARK-1855:
> >> ------------------------------------
> >>
> >>              Summary: Provide memory-and-local-disk RDD checkpointing
> >>                  Key: SPARK-1855
> >>                  URL: https://issues.apache.org/jira/browse/SPARK-1855
> >>              Project: Spark
> >>           Issue Type: New Feature
> >>           Components: MLlib, Spark Core
> >>     Affects Versions: 1.0.0
> >>             Reporter: Xiangrui Meng
> >>
> >>
> >> Checkpointing is used to cut long lineage while maintaining fault
> >> tolerance. The current implementation is HDFS-based. Using the BlockRDD
> we
> >> can create in-memory-and-local-disk (with replication) checkpoints that
> are
> >> not as reliable as HDFS-based solution but faster.
> >>
> >> It can help applications that require many iterations.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.2#6252)
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message