reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <>
Subject Re: reef examples
Date Wed, 29 Oct 2014 18:45:40 GMT

> A. Is heartbeating inherent to WAKE and automatically enabled on driver? Does it require
configuration (like interval length)?
REEF always sets up the communications between Evaluators and the 
Driver. The heartbeat mechanism is configurable somewhere, but we don't 
really expose that at this point (we should, though).

> B. Is heartbeating taking care of all driver/task handlers including ON_SEND_MESSAGE?
Why are call(), getMessage() become synchronized with task/driver messaging?

This is where a lot of confusion enters the picture. So allow me to be 

Driver-to-Evaluator communication is always push based. That is, calling 
RunningTask.send() will immediately proceed to send that message to the 
Task in question. On the Evaluator side, the message is received and 
routed to the handler configured in TaskConfiguration.ON_MESSAGE.

Evaluator-to-Driver communication is heartbeat based, for the most part 
(I'll talk about the exceptions to this below). When you want to send 
something from the Task to the Driver, you register a handler with 
TaskConfiguration.ON_SEND_MESSAGE. This handler will be invoked by the 
Evaluator whenever a heartbeat is about to be sent. In other words: The 
Task has no way of initiating the communication. This seems (and in fact 
is) annoying, but it serves a purpose: When dealing with thousands of 
Tasks, a single Driver can easily be overwhelmed if they all choose to 
send large messages at once, leading to congestion and ultimately poor 
performance. Allowing the framework to be in charge of the sending 
allows us to pace things just right. Ideally, we'd have a dynamic 
heartbeat interval that takes the current load on the Driver into 
account, for instance. We currently don't have that, though 
(contributions welcome ;-)).

Now the exception to that setup: State changes of an Evaluator are in 
fact pushed. That is: When e.g. a Task finishes, we immediately send a 
message to the Driver. This drastically reduces the latency compared to 
pull based control flows such as the one found in Hadoop 1. It is safe, 
because we still control the size and timing of these messages.


View raw message