reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Chung <afchun...@gmail.com>
Subject Re: Test failure in master
Date Wed, 18 May 2016 17:18:24 GMT
Hi,

Yes, that is the main issue.

Right now when checking for idleness, we were only checking the
`ThreadPool` queue, but we never check whether the EventHandler is still
*active* or not. Thus, if we call `Thread.sleep` within the
`FailedEvaluatorHandler`, the `EvaluatorManager` will check the
`ThreadPool`, which will report that it is idle (as the `ThreadPool` queue
is now empty), and shut down. This might be fine in Java because we didn't
have to go through the InterOp layer and the EventHandler always finishes
before an idleness check, but this shows when going through the C# code.

There are trickier issues going on, namely when the following events occur:
1. We call close on an Evaluator, in *any* part of the code.
2. close triggers an idleness check, but with fix to REEF-1393, the
EventHandler is still active.
3. No more calls to check idleness for a while.
4. We don't exit before the Test times out.

I'm still currently running tests and checking different scenarios, but a
fix (albeit ugly and potentially resource consuming) to run a Thread at the
end of `close` that repeatedly checks that all Evaluator messages are
handled before triggering an idleness check *should* work. Please let me
know if a better fix exists.

Thanks,
Andrew

On Tue, May 17, 2016 at 5:56 PM, Markus Weimer <markus@weimo.de> wrote:

> Andrew, is REEF-1393 the root cause / fix for this?
>
> Markus
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message