spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?
Date Sun, 26 Jan 2014 06:03:51 GMT
I'm not entirely sure, but two candidates are

the visit function in stageDependsOn

submitStage






On Sat, Jan 25, 2014 at 10:01 PM, Aaron Davidson <ilikerps@gmail.com> wrote:

> I'm an idiot, but which part of the DAGScheduler is recursive here? Seems
> like processEvent shouldn't have inherently recursive properties.
>
>
> On Sat, Jan 25, 2014 at 9:57 PM, Reynold Xin <rxin@databricks.com> wrote:
>
> > It seems to me fixing DAGScheduler to make it not recursive is the better
> > solution here, given the cost of checkpointing.
> >
> > On Sat, Jan 25, 2014 at 9:49 PM, Xia, Junluan <junluan.xia@intel.com>
> > wrote:
> >
> > > Hi all
> > >
> > > The description about this Bug submitted by Matei is as following
> > >
> > >
> > > The tipping point seems to be around 50. We should fix this by
> > > checkpointing the RDDs every 10-20 iterations to break the lineage
> chain,
> > > but checkpointing currently requires HDFS installed, which not all
> users
> > > will have.
> > >
> > > We might also be able to fix DAGScheduler to not be recursive.
> > >
> > >
> > > regards,
> > > Andrew
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message