geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Gomez <alberto.go...@est.tech>
Subject Re: [DISCUSS] Improvements on client function execution API
Date Thu, 05 Sep 2019 16:25:12 GMT
Hi all,

First of all, thanks a lot Dan and Jacob for your feedback.

As we are getting close to the deadline I am adding here some conclusions and a refined proposal
in order to get some more feedback and if possible some voting on the two alternatives proposed
(or any other in between if you feel any of them is lacking something).

I also add some draft code to try to clarify a bit the more complex of the alternatives.


Proposal summary (needs a decision on which option to implement):
-------------------------------------------------------------------------------------------

In order to make the API more coherent two alternatives are proposed:

a) Remove the timeout from the ResultCollector::getResult() / document that the timeout has
no effect, taking into account that Execution::execute() is always blocking.
Additionally we could add the timeout parameter to the Execution::execute() method of the
Java API in order to align it with the native client APIs. This timeout would not be the read
timeout on the socket but a timeout for the execution of the operation.

b) Change the implementation of the Execution::execute() method without timeout to be non-blocking
on both the Java and native APIs. This change has backward compatibility implications, would
probably bring some performance decrease and could pose some difficulties in the implementation
on the C++ side (in the  handling of timed out operations that hold resources).


The first option (a) is less risky and does not have impacts regarding backward compatibility
and performance.

The second one (b) is the preferred alternative in terms of the expected behavior from the
users of the API. This option is more complex to implement and as mentioned above has performance
and backward compatibility issues not easy to be solved.

Following is a draft version of the implementation of b) on the Java client:
https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da

Following is a draft version of the implementation of b) on the C++ native client:
https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d

Note that the above implementation of b) in the C++ client implies that the Execution object
returned by the FunctionService cannot be destroyed until the thread executing the function
asynchronously has finished. If the function times out, the Execution object must be kept
until the thread finishes.


Other considerations
-------------------------

* Currently, in the function execution Java client there is not a possibility to set a timeout
for the execution of functions. The closest to this is the read timeout that may be set globally
for function executions but this is not really a timeout for operations.

* Even if the API for function execution is the same on clients and servers, the implementation
is different between them. On the clients, the execute() methods are blocking while on the
servers it is non-blocking and the invoker of the function blocks on the getResult() method
of the ResultCollector returned by the execute() method.
Even if having both blocking and non-blocking implementation of execute() in both clients
and servers sounds desirable from the point of view of orthogonality, this  could bring complications
in terms of backward compatibility. Besides, a need for a blocking version of function execution
has not been found.

-Alberto G.

On 29/8/19 16:48, Alberto Gomez wrote:

Sorry, some corrections on my comments after revisiting the native
client code.

When I said that the timeout used in the execution() method (by means of
a system property) was to set a read timeout on the socket, I was only
talking about the Java client. In the case of the native clients, the
timeout set in the execute() method is not translated into a socket
timeout but it is the time to wait for the operation to complete, i.e.,
to get all the results back.

Things being so, I would change my proposal to:

- Change the implementation of execute() on both Java and native clients
to be non-blocking (having the blocking/non-blocking behavior
configurable in the release this is introduced and leaving only the
non-blocking behavior in the next release).

- Either remove the execute() with timeout methods in the native clients
(with a deprecation release) or implement the execute(timeout) method in
the Java client to be blocking (to work as the native client does
today). In case the method times out, the connection will not be closed.
If the operation times out due to the socket timeout (system property),
then the connection will be closed as it is now done in the Java client.

- Do not implement the blocking execute(timeout) method on the server
and leave the current execute() implementation on the server as it is
(non-blocking)

Does this make sense?

-Alberto

On 29/8/19 12:56, Alberto Gómez wrote:


Hi Dan,

Discussing these matters by e-mail is getting tricky.

Let's see if I understand you correctly and also if I am being clear
enough.

Please, see my comments inline.

On 29/8/19 0:49, Dan Smith wrote:


Sorry for the slow response, I've been trying to decide what I think
is the
right approach here.

For (1) - conceptually, I don't have a problem with having both blocking
and non blocking methods on Execution. So adding blocking versions of
execute() with a timeout seems ok. But I do think if we add them we
need to
implement them on both the client and the server to behave the same way.
That shouldn't be too hard on the server since execute(timeout) can just
call getResult(timeout) internally.



We have to take into account that, currently, the timeout in execute()
is not the same thing as the timeout in getResult().

On the one hand, the timeout set in execute() (via System property in
the Java client, and with a parameter in the native client) sets a
readtimeout on the socket which just means that if nothing is read
from the socket after sending the request to the server for the given
timeout, the corresponding exception will be thrown. It looks to me
more like a protection against possible communication failures rather
than a mechanism to decide if results took too long to be provided. So
I would not link the presence of the timeout parameter in the method
to the nature of the method (blocking or non-blocking). I think we
could have this read timeout set and at the same time keep that method
as non-blocking.

On the other hand, the timeout set in getResult() is a timeout to wait
for all the results to be received from the moment the method is invoked.

Therefore, I would not implement the blocking version of execute() on
the server by calling getResult() at the end.

Apart from that, I doubt if it would make sense to set this
readTimeout in the execute() methods from servers given that the
communication is very different to the one done with clients. I also
doubt that anyone would be interested in the blocking version of
execute() on the server.

My proposal was to add the readtimeout to the execute() methods in the
Java client in order to align the Java client and the native client.
This change would be independent to the decision we make regarding the
change of execute() to non-blocking. To achieve this alignment,
alternatively, we could remove the timeout parameter in execute() from
the native clients and have it as a global property for the client to
be set by whatever mechanism available as it is done in the Java
client today.

Were you proposing that the execute() methods with timeout were
blocking and the ones without timeout non-blocking? Not sure if this
is something you meant.




For (2) - Although I think the original authors of this API probably did
intend for execute() to be non-blocking, the fact is that it does
block on
the client and most users are probably calling execute from a client.
So I
do agree we probably shouldn't change the behavior at this point.
Perhaps
we can just clearly document the current behavior of execute() as
part of
adding these new methods. Going forward we can add new methods to
Execution
that are clearly non-blocking (submit?, invoke?) and implement them
consistently on *both* the client in the server, but that doesn't
have to
be in the scope of this proposal.



The problem I see with adding new non-blocking methods (new/submit...)
is that it would be a solution for the current users of the client
regarding backwards compatibility. But, on the server side, we would
have to move the current logic of execute() which is non-blocking to
the new methods and change the current execute() behavior to blocking.
We would not impact the users of the client but we would impact the
users of the server.

Again, I would propose to aim at:

a) either leave execute() on the client as blocking

b) or change execute() on the client to be non-blocking but on the
Geode release this is introduced, have it configurable. The default
behavior would be blocking (deprecated behavior) but could be set to
non-blocking with a system property. On the next release, the blocking
behavior would be removed.

- Alberto G.



-Dan

On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.gomez@est.tech><mailto:alberto.gomez@est.tech>
wrote:



Hi Jake,

Please, see my answers below.

On 22/8/19 21:16, Jacob Barrett wrote:


On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.gomez@est.tech><mailto:alberto.gomez@est.tech>


wrote:


2. Timeout in ResultCollector::getResult() and Execution::execute()


blocking


Regarding the timeout in the ResultCollector::getResult() method


problem and the blocking/non-blocking confusion for
Execution::execute()
two alternatives are considered:


a) Remove the possibility of setting a timeout on the


ResultCollector::getResult() method on the client side as with the
current
client implementation it is useless. This could be done by removing the
method with the timeout parameter from the public API.


It would be advisable to make explicit in the documentation that the


getResult() method does not wait for results to arrive as that
should have
already been done in the Execution::execute() invocation.


This alternative is very simple and would keep things pretty much as


they are today.


To be honest I think approach would go against what a user “thinks” is


going on. Given that there hasn’t been a timeout on execute I
assumed it
was asynchronous and that the getResult blocked until timeout or
results
arrived. Typically these two calls were done back to back.

You are right if you look at the Java client. But if you look at the
native clients, the timeout is there, both in the C++ and the C# cases
which would indicate that it is a blocking call.

See

https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776

and

https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d



b) Transform the Execution::execute() method on the client side
into a


non-blocking method.


This alternative is more complex and requires changes in all the


clients. Apart from that it has implications on the public client
API it
requires moving the exceptions offered currently by the
Execution::execute() method to the ResultCollector::getResult() and new
threads will have to be managed.


I think this is more in line with what users expect is going on
based on


the current API, I know I have. If were are going to make any change I
think this is the one. I don’t think the behavior change is a
problem since
it's what is expected to be happening anyway.


An outline of a possible implementation for option b) would be:

   *   Instead of invoking the ServerRegionProxy::executeFunction()


directly as it is done today, create a Future that invokes this
method and
returns the resultCollector passed as parameter.


Do you really think we need to introduce Futures here into the API? I


feel like the ResultCollector acts as the Future. I don’t think any
change
needs to be made to the API in this regard. The ResultCollector
implementation would just simply block as implied by the api for the
timeout period. I would change the method to have units though and
deprecate the method without units.

I did not mean to introduce Futures in the API. My idea was to use Java
Futures internally so that the ResultCollector returned by the
getResult() method would wrap the Java Future with the ResultCollector
that would actually hold the result.

An alternative would be to leave the logic of blocking to each
implementation ResultCollector. In the case of the
DefaultResultCollector, we could use a CountDownLatch that would be
decremented when endResults() is called and that would make getResult()
block by using CountDown.await(...).

The advantage of using Futures internally is that the blocking logic
would not have to be implemented on every ResultCollector
implementation.



-Jake




- Alberto


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message