ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-5056) Implement communication backpressure control
Date Mon, 10 Jul 2017 08:00:03 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vladimir Ozerov updated IGNITE-5056:
------------------------------------
    Fix Version/s:     (was: 2.1)
                   2.2

> Implement communication backpressure control
> --------------------------------------------
>
>                 Key: IGNITE-5056
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5056
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Yakov Zhdanov
>            Assignee: Yakov Zhdanov
>            Priority: Critical
>             Fix For: 2.2
>
>
> Problem
> Currently backpressure control relies on semaphore on sending side that ensures that
sending queue cannot be  overflown and a special counter on receiving side that stops reading
from the socket when unprocessed message count outgrows limit config parameter.
> In some scenarios it may lead to a distributed deadlock. E.g. we send many async jobs
to remote nodes which in turn do sync cache operations. If task master node is a backup or
primary for some cache updates and has already scheduled too many job requests for send it
will not be able to respond to cache requests thus remote jobs would never complete.
> Solution
> Reading from socket should never stop
> Design notes
> * add IgniteConfiguration.maxAsyncRequests and propagate it via node attributes to all
nodes of the cluster. All nodes may have different value (however this is unlikely).
> * add a flag to GridIoMessage.async. If flag is false then sender node assumed to synchronously
wait for response and does not wait otherwise.
> * all sent async messages should be tracked on sender node on per-receiver basis.
> * all received async messages should be tracked on receiver nodes
> * nodes should add flag to communication acks on whether they can more async messages
or not 
> * sender should never exceed IgniteConfiguration.maxAsyncRequests async requests per
node
> * if IgniteConfiguration.maxAsyncRequests is exceeded or node sets flag in communication
ack then all async messages become sync
> * above means:
> ** next compute job from the task is sent to the node only after response for some previous
comes
> ** next dht update request (for primary sync or full async) is sent, but node doesn't
send response to near node unless it has not received response for former operation from remote
backup or for this operation
> ** next cache operation becomes sync - we force user code to wait on operation future.
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message