ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Popov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-7704) Document IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their relations
Date Thu, 01 Mar 2018 13:35:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363879#comment-16363879
] 

Alexey Popov edited comment on IGNITE-7704 at 3/1/18 1:34 PM:
--------------------------------------------------------------

Sample description:

GLOBAL:

IgniteConfiguration.setNetworkTimeout:
It is a global timeout for high-level operations where a network is involved. For instance,
IgniteMessaging delivery uses this timeout or DiscoverySpi handshake.

IgniteConfiguration.setFailureDetectionTimeout:
It is a global timeout for detecting failures at IgniteSpi implementations (including DiscoverySpi
and CommunicationSpi).
The failure detection algorithm actually limits a range of simple network operations related
to a single logical operation (for instance, a reliable delivery of some DiscoverySpi message
within a cluster).
Failure detection timeout is a cumulative timeout for a socket connection, sending and receiving
data bytes and all possible socket retries (if some failure happens). 
This timeout is intended to simplify the failure detection condition from a user perspective.

IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special case for DiscoverySpi
client-node Ignite.

TCP DISCOVERY SPI:

If you need more control over failure detection algorithm for TcpDiscoverySpi you can explicitly
use the following low-level options (that will disable failureDetectoinTimeout logic):

1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used when establishing
connection with the remote node and sending messages to it
3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write operation will be repeated
getReconnectCount() times if it exceeds this timeout
4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a message acknowledgment
is not received within this timeout, sending is considered as failed and SPI will try to repeat
send operation. It is automatically doubled for simultaneous retries up to getMaxAckTimeout
value.
5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the getAckTimeout reaches
getMaxAckTimeout then SPI give up sending retries

Another important TcpDiscoverySpi timeouts:

TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a new/restarted node
joins a cluster. The node tries to connect to all available IP addresses provided by ipFinder
within this timeout.
If the timeout is exceeded, the node will give up and throw an exception from Ignition.start().

TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like handshake. It looks
like it should be deprecated and the IgniteConfiguration.getNetworkTimeout should be used
here.

TCP COMMUNICATION SPI:

If you need more control over failure detection algorithm for TcpCommunicationSpi you can
explicitly use the following low-level options (that will disable failureDetectoinTimeout
logic):

1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will be automatically
doubled for simultaneous retries (up to getReconnectCount) related to a single logical operation

2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout, the higher limit
of getReconnectCount-times doubled getConnectTimeout
3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts used when establishing
connection with the remote node and sending messages to it

Another important TcpCommunicationSpi timeouts:

TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout upon which a connection
will be closed.

 


was (Author: alexey.tank2):
Sample description:

GLOBAL:

IgniteConfiguration.setNetworkTimeout:
It is a global timeout for high-level operations where a network is involved. For instance,
IgniteMessaging delivery uses this timeout or DiscoverySpi handshake.

IgniteConfiguration.setFailureDetectionTimeout:
It is a global timeout for detecting failures at IgniteSpi implementations (including DiscoverySpi
and CommunicationSpi).
The failure detection algorithm actually limits a range of simple network operations related
to a single logical operation (for instance, a reliable delivery of some DiscoverySpi message
within a cluster).
Failure detection timeout is a cumulative timeout for a socket connection, sending and receiving
data bytes and all possible socket retries (if some failure happens). 
This timeout is intended to simplify the failure detection condition from a user perspective.

IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special case for DiscoverySpi
client-node Ignite.

TCP DISCOVERY SPI:

If you need more control over failure detection algorithm for TcpDiscoverySpi you can explicitly
use the following low-level options (that will disable failureDetectoinTimeout logic):

1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used when establishing
connection with the remote node and sending messages to it
3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write operation will be repeated
getReconnectCount() times if it exceeds this timeout
4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a message acknowledgment
is not received within this timeout, sending is considered as failed and SPI will try to repeat
send operation. It is automatically doubled for simultaneous retries up to getMaxAckTimeout
value.
5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the getAckTimeout reaches
getMaxAckTimeout then SPI give up sending retries

Another important TcpDiscoverySpi timeouts:

TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a new/restarted node
joins a cluster. The node tries to connect to all available IP addresses provided by ipFinder
within this timeout.
If the timeout is exceeded, the node will give up and throw an exception from Ignition.start().

TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations like handshake. It looks
like it should be deprecated and the IgniteConfiguration.getNetworkTimeout should be used
here.

TCP COMMUNICATION SPI:

If you need more control over failure detection algorithm for TcpCommunicationSpi you can
explicitly use the following low-level options (that will disable failureDetectoinTimeout
logic):

1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, will be automatically
doubled for simultaneous retries (up to getReconnectCount) related to a single logical operation

2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection timeout, the higher limit
of getReconnectCount-times doubled getConnectTimeout
3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts used when establishing
connection with the remote node and sending messages to it

Another important TcpCommunicationSpi timeouts:

TcpDiscoverySpi.setSockWriteTimeout - socket write timeout. The write operation will be repeated
getReconnectCount() times if it exceeds this timeout
TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection timeout upon which a connection
will be closed.

 

> Document IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their
relations
> -----------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-7704
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7704
>             Project: Ignite
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.3
>            Reporter: Alexey Popov
>            Priority: Major
>
> We often see similar questions related to IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
timeouts and their relations. And we see several side-effects after incorrect timeout configuration.
> It looks like this question is not well documented.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message