reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruv Mahajan (JIRA)" <>
Subject [jira] [Commented] (REEF-1392) Adding IObserver<ICloseEvent> for IMRU tasks
Date Wed, 18 May 2016 22:52:12 GMT


Dhruv Mahajan commented on REEF-1392:

[~markus.weimer] So the semantics of IMap and IUpdate task needs to be changed and discussed
now. Few points:

a) We assume that state in mappers is helpful but recoverable or not critical. For example,
for LBFGS even if we loose history it is ok and update task can communicate to master the
necessary model again for computations.

b) The state maintenance and preservation is totally left to the corresponding map and update
tasks. For example, they do a synchronized check pointing every few iterations and then when
they re-start it is their job to recover the checkpoints and start from appropriate location.
A simpler subcase is to allow only checkpointing in Update function.

c) We give a chance to the tasks to do some state preservation if necessary before closing.
This is tricky according to me since you already have lost state from the failed evaluator.
So the full global state cannot be re-constructed.

> Adding IObserver<ICloseEvent> for IMRU tasks
> --------------------------------------------
>                 Key: REEF-1392
>                 URL:
>             Project: REEF
>          Issue Type: Task
>            Reporter: Julia
>            Assignee: Julia
>              Labels: FT
> For fault tolerant, IMRU tasks, MapTaskHost and UpdateTaskHost should implement  IObserver<ICloseEvent>.
When they receive ICloseEvent,  it will verify if the closing event is send from driver based
on the message in the event, then trow IMRUTaskException with a define message to inform the
driver it is closed. 
> The  change should be backward compatible. If the the IMRU tasks are not bound to the
task configuration for  TaskConfiguration.OnClose, the event won't be received. 

This message was sent by Atlassian JIRA

View raw message