reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia Wang (QIUHE)" <Qiuhe.W...@microsoft.com>
Subject RE: Issues with IActiveContext.SubmitContextAndService
Date Tue, 22 Mar 2016 17:41:08 GMT
For this phase, what we want to handle is if evaluators/contexts fail, request new evaluators,
stop un-impacted tasks, then start entire group again. 

Do we want to distinguish Evaluator failure and context failure? We can discuss about it to
see if it brings enough benefit with lower cost. 

Thanks,
Julia

-----Original Message-----
From: Dhruv Mahajan [mailto:dhruv.mahajan@gmail.com] 
Sent: Tuesday, March 22, 2016 10:26 AM
To: dev@reef.apache.org
Subject: Re: Issues with IActiveContext.SubmitContextAndService

So, We need to shut down evaluator only in case if they totally fail. As I said one evlauator
failing will lead to series of context failures in all other evaluators because of dependency
on each other via streams. So we need proper handlers for IFailedContext and IFailedTasks
and not simply discard evaluator if this happens.

Dhruv

On Tue, Mar 22, 2016 at 10:19 AM, Julia Wang (QIUHE) < Qiuhe.Wang@microsoft.com> wrote:

> Do you mean you want to separate evaluator failure from context failures?
> At this stage, we will restart evaluator for this case.
>
> -----Original Message-----
> From: Dhruv Mahajan [mailto:dhruv.mahajan@gmail.com]
> Sent: Tuesday, March 22, 2016 10:16 AM
> To: dev@reef.apache.org
> Cc: dev@reef.incubator.apache.org; Jaliya Ekanayake < 
> jaliyaek@microsoft.com>
> Subject: Re: Issues with IActiveContext.SubmitContextAndService
>
> Another critical reason for not doing it:
>
> Assuming some evaluator failed, Streams will close and there will be 
> stream failures in other evaluators which will lead to failed contexts 
> in Group Comm. service. Since DataLoading is part of same context this 
> means we have ot load data again even for evaluators that suffered 
> failed contexts. Does that make sense?
>
> Dhruv
>
> On Tue, Mar 22, 2016 at 10:07 AM, Dhruv Mahajan 
> <dhruv.mahajan@gmail.com>
> wrote:
>
> > In near future, we will need to build ML workflow where driver has 
> > to execute series of jobs, say IMRU followed by Parameter server and so on.
> > Common things between all of them will be data and not Group 
> > Comm.....Hence, I would like to keep it as a separate layer.
> >
> > Dhruv
> >
> > On Tue, Mar 22, 2016 at 10:04 AM, Julia Wang (QIUHE) < 
> > Qiuhe.Wang@microsoft.com> wrote:
> >
> >> Are you submitting the second context? Why not to submit everything 
> >> with the IAllocatedEvaluator. SubmitContextAndSErvice()? What is 
> >> the consideration to use second context?
> >>
> >> -----Original Message-----
> >> From: Dhruv Mahajan [mailto:dhruv.mahajan@gmail.com]
> >> Sent: Tuesday, March 22, 2016 10:00 AM
> >> To: dev@reef.apache.org
> >> Cc: dev@reef.incubator.apache.org; Jaliya Ekanayake < 
> >> jaliyaek@microsoft.com>
> >> Subject: Re: Issues with IActiveContext.SubmitContextAndService
> >>
> >> So first step:
> >>
> >> a) When we get IAllocatedEvaluator, we first submit DataLoading 
> >> Context and Service and Data gets read.
> >> b) Then once we get IActiveContext with data loaded, we need to 
> >> call
> >> IActiveContext.SubmitContextAndSErvice() for group comm. service.
> >> c) Now once we get the IActiveContext back with Group comm. service 
> >> also instanitates we will submit the IMRU task.
> >>
> >> We do not want to mix the two. Data Loading has to be separate from 
> >> everything else.
> >>
> >> Dhruv
> >>
> >> On Tue, Mar 22, 2016 at 9:56 AM, Julia Wang (QIUHE) < 
> >> Qiuhe.Wang@microsoft.com> wrote:
> >>
> >> > In which scenario you need to use IActiveContext to 
> >> > SubmitContextAndService?
> >> >
> >> > My understanding is that is fault tolerant case, you 
> >> > SubmitContextAndService with IAllocatedEvaluator at first time. 
> >> > If any evaluator fails, we resubmit a new evaluator.
> >> >
> >> > Thanks,
> >> > Julia
> >> >
> >> > -----Original Message-----
> >> > From: Dhruv Mahajan [mailto:dhruv.mahajan@gmail.com]
> >> > Sent: Tuesday, March 22, 2016 9:28 AM
> >> > To: dev@reef.apache.org
> >> > Cc: dev@reef.incubator.apache.org
> >> > Subject: Re: Issues with IActiveContext.SubmitContextAndService
> >> >
> >> > So are we stuck with REEF-1224 till this is resolved? Or for now 
> >> > passing whatever Group Communication service configuration 
> >> > generate as part of context service will work just fine? We are 
> >> > not going to stack it further once we put GC context.
> >> >
> >> > Dhruv
> >> >
> >> > On Tue, Mar 22, 2016 at 8:44 AM, Markus Weimer <markus@weimo.de>
> wrote:
> >> >
> >> > > On 2016-03-21 23:46, Julia Wang (QIUHE) wrote:
> >> > >
> >> > >> With IActiveContext, we support SubmitContext() but not 
> >> > >> SubmitContextAndService yet I believe.
> >> > >>
> >> > >
> >> > > OK. I filed
> >> > https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2f
> >> > is
> >> > sue
> >> > s.apache.org%2fjira%2fbrowse%2fREEF-1267&data=01%7c01%7cQiuhe.Wan
> >> > g%
> >> > 40m
> >> > icrosoft.com%7c8e535ba1ae2d4790953108d3526eec76%7c72f988bf86f141a
> >> > f9
> >> > 1ab
> >> > 2d7cd011db47%7c1&sdata=mLGuPsaFUimYnjutmazr9xZSk2U7XTEweiauEXBUdf
> >> > 0%
> >> > 3d
> >> > for this.
> >> > >
> >> > > Markus
> >> > >
> >> >
> >>
> >
> >
>
Mime
View raw message