reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Meleshko <andr...@microsoft.com>
Subject how to cancel task initialization
Date Tue, 19 Jul 2016 23:15:18 GMT
Trying this as part of IMRU testing, but it seem to touch problem in reef core.


1)      As part of ContextManager.StartTask() we are calling into OperatorTopology.Initialize(),
which blocks the thread waiting for all child tasks to register.
if any of the child nodes failed to register (for instance: evaluator failure on startup),
Master node will hang around until configured timeout expires (current default timeout in
reef-1251 is ~1 hour)
As I understand, In case of IMRU, UpdateTask node has Map nodes as children. So if any of
the Map evaluators crash during task start,
this blocks IMRU driver restart for at least 1h (default)....or timeout parameters job definition
specified (IMRU driver is waiting for all task to switch into final state)

Is there a way or jira on how to shutdown master node, so that it doesn't wait for the child
registration timeout?


2)      Side observation: Currently when ContextManager.StartTask() locks access on (_contextlock)
object.
The same object is used to lock Dispose() method. So if StartTask() takes long time, Dispose()
will be blocked as well.

Thank you,
/andrey

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message