reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariia Mykhailova (JIRA)" <>
Subject [jira] [Commented] (REEF-1482) IMRU driver does not exit even if all the task exit normally
Date Fri, 30 Sep 2016 18:21:20 GMT


Mariia Mykhailova commented on REEF-1482:

>From the discussion on the mailing list (to keep relevant information in one place):

The problem with the previous code in the Driver was that when checking whether or not an
Evaluator has ended, it checks its event queue for events that are still in the queue. If
there are still events in the queue, the EvaluatorManager will return not-idle to the driver
and thus preventing shutdown. However, the previous code did not consider events that are
still being processed by an EventHandler.
The PR fixed it by incrementing a counter when entering an EventHandler and decrementing the
counter when exiting it (see `ThreadPoolStage` in the PR, the relevant calls are `beforeOnNext()`
and `afterOnNext()`), complete with a Thread that checks the counter repeatedly until it's
declared completed when an Evaluator is shut down. `EvaluatorIdlenessThreadPoolSize is the
size of the thread pool for checking the counter on all `EvaluatorManagers`. This seems like
the most suspicious code which may cause the driver to fail to shut down after all Evaluators
have completed. It might be useful to add logging statements with a higher error level upon
checking for Evaluator completion to see if this is really the problem.

> IMRU driver does not exit even if all the task exit normally
> ------------------------------------------------------------
>                 Key: REEF-1482
>                 URL:
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
> Recently, upon running IMRU with large number of mappers, it is observed intermittently
that IMRU driver does exit while all other tasks (map and update) exit normally without any
> The aim of this JIRA is to fix it.

This message was sent by Atlassian JIRA

View raw message