ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Lukyanov <stanlukya...@gmail.com>
Subject RE: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts
Date Mon, 09 Jul 2018 12:47:34 GMT
Server will use its failureDetectionTimeout when talking to servers and clientFailureDetectionTimeout
when talking to clients.
E.g. a Communication link from server to server uses a failureDetectionTimeout, and server
to client uses a clientFailureDetectionTimeout.

Client will use its failureDetectionTimeout all the time, ignoring clientFailureDetectionTimeout.

There is even a possibility of asymmetric settings.
Say, server and client use the same config, failureDetectionTimeout=10 and clientFailureDetectionTimeout=20.
When these two nodes communicate, server will use timeouts of 20 seconds and client will use
timeout of 10 seconds.

Stan

From: Valentin Kulichenko
Sent: 6 июля 2018 г. 23:17
To: dev@ignite.apache.org
Subject: Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpitimeouts

Stan,

Can you explain the semantics of both parameters? How do they behave when
set on client or on server?

-Val

On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <stanlukyanov@gmail.com>
wrote:

> We could just use failureDetectionTimeout all the time I guess.
> The only benefit of clientFailureDetectionTimeout is that it may allow
> clients to be slower/on a slower network than servers.
>
> Do you think it isn’t worth to have a separate setting just for that?
>
> Thanks,
> Stan
>
> From: Valentin Kulichenko
> Sent: 5 июля 2018 г. 18:16
> To: dev@ignite.apache.org
> Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> TcpCommunicationSpitimeouts
>
> Stan,
>
> What is the purpose of clientFailureDetectionTimeout? Why can't we just
> always use failureDetectionTimeout? Is there any difference between these
> two timeouts?
>
> -Val
>
>
>
> On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov <stanlukyanov@gmail.com>
> wrote:
>
> > Hi,
> >
> > I’ve updated the proposed documentation update with a description of
> > metricsUpdateFrequency and a detailed description of
> > failureDetectionTimeout and clientFailureDetectionTimeout relations. The
> > draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704.
> >
> > It seems that relation between failureDetectionTimeout and
> > clientFailureDetectionTimeout is currently too tricky and should also be
> > changed in future.
> > The problem is that in a server-client connection the server will use
> > clientFailureDetectionTimeout but client will use
> failureDetectionTimeout.
> > In other words, clients ignore clientFailureDetectionTimeout and just use
> > failureDetectionTimeout. Because of that, one has to provide different
> > values of failureDetectionTimeout in server and client configs which
> seems
> > confusing and inconvenient.
> > So I’d like to add one more point to my earlier proposal:
> >
> > 5. Always use clientFailureDetectionTimeout on clients instead of
> > failureDetectionTimeout
> > *What*: change code to use clientFailureDetectionTimeout on clients
> > *When*: update code and readme.io docs in 2.7
> >
> > Thanks,
> > Stan
> >
> > From: Valentin Kulichenko
> > Sent: 30 мая 2018 г. 19:09
> > To: dev@ignite.apache.org
> > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > TcpCommunicationSpitimeouts
> >
> > Stan,
> >
> > Looks like you suggest to only change the default. If so, it's OK. But
> > let's not change the behavior of these timeouts for the case they are
> > explicitly set in config.
> >
> > Thanks,
> > Val
> >
> > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov <
> > stanlukyanov@gmail.com>
> > wrote:
> >
> > > On networkTimeout: no, we don’t have anything like that in
> > > TcpCommunicationSpi.
> > >
> > > On socketWriteTimeout:
> > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout
> (with
> > > the exception that communication uses NIO), and the latter defaults to
> > > failureDetectionTimeout,
> > > so I think it would help to avoid confusion.
> > > Second, I think we can’t deprecate something without an alternative
> that
> > > would work for most users.
> > > On the other hand, if we do default socketWriteTimeout to
> > > failureDetectionTimeout then we reach a pretty decent API state
> > > where one only needs two properties in IgniteConfiguration neither of
> > > which we’re considering for deprecation and removal in 3.0.
> > >
> > > Stan
> > >
> > > From: Valentin Kulichenko
> > > Sent: 29 мая 2018 г. 22:17
> > > To: dev@ignite.apache.org
> > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > TcpCommunicationSpitimeouts
> > >
> > > Stan,
> > >
> > > OK, I got confused a little :)
> > >
> > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from
> > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have
> the
> > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be
> > > consistent.
> > >
> > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you
> want
> > > to
> > > change its behavior. Can we just deprecate it and eventually remove,
> just
> > > as we plan to do for all timeouts from #2?
> > >
> > > -Val
> > >
> > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov <
> > > stanlukyanov@gmail.com>
> > > wrote:
> > >
> > > > Val,
> > > >
> > > > Which timeouts do you mean?
> > > >
> > > > In #2 I don’t propose to change behavior.
> > > >
> > > > I propose to change behavior for a couple of settings in #3 though.
> > > > I believe the correct approach here would be to target the behavior
> > > change
> > > > for 2.6,
> > > > but keep in mind that we’ll need to carefully analyze the impact
> before
> > > > actually making the changes.
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > > From: Valentin Kulichenko
> > > > Sent: 29 мая 2018 г. 0:57
> > > > To: dev@ignite.apache.org
> > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi,
> > > > TcpCommunicationSpitimeouts
> > > >
> > > > Hi Stan,
> > > >
> > > > I'm 100% for this activity, however I don't think we should change
> the
> > > > behavior of timeouts you listed in #2 - this can lead to unexpected
> > > > behavior for users who already use them. I would just deprecate them
> > and
> > > > eventually remove.
> > > >
> > > > -Val
> > > >
> > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov <
> > > > stanlukyanov@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > It looks like we stopped half-way with this activity. I’d like
to
> > pick
> > > it
> > > > > up.
> > > > >
> > > > > All seem to agree that we should simplify the timeout settings.
> > > > > Here are the specific actions I’d like to propose:
> > > > >
> > > > > 1. Promote the use of global timeouts as the best practice
> > > > > *What*: update the docs to encourage users to rely on the following
> > > > > timeouts for their “network stability” settings
> > > > > IgniteConfiguration.failureDetectionTimeout
> > > > > IgniteConfiguration.clientFailureDetectionTimeout
> > > > > IgniteConfiguration.networkTimeout
> > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6
> > > > >
> > > > > 2. Discourage the use of finer timeouts
> > > > > *What*:
> > > > > - update the docs to discourage users to use the following timeouts
> > and
> > > > > announce their upcoming deprecation and removal
> > > > > TcpDiscoverySpi.socketTimeout
> > > > > TcpDiscoverySpi.ackTimeout
> > > > > TcpDiscoverySpi.maxAckTimeout
> > > > > TcpDiscoverySpi.reconnectCount
> > > > > TcpCommunicationSpi.connectTimeout
> > > > > TcpCommunicationSpi.maxConnectTimeout
> > > > > TcpCommunicationSpi.reconnectCount
> > > > > - deprecate the properties in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - @Deprecated in code + Javadoc update + respective readme.io
> > > rewording
> > > > > for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate
> and
> > > > > remove
> > > > > *What*:
> > > > > Two settings currently don’t default to the global equivalents,
> > > although
> > > > > they should:
> > > > > - TcpCommunicationSpi.socketWriteTimeout should default to
> > > > > failureDetectionTimeout
> > > > > - TcpDiscoverySpi.networkTimeout should default to
> > IgniteConfiguration.
> > > > > networkTImeout
> > > > > So the course of action would be:
> > > > > - update the docs to explain that these timeouts have to be used
> for
> > > now,
> > > > > but announce their upcoming deprecation and removal
> > > > > - change the properties to default to their global counterparts and
> > > > > deprecate them in code
> > > > > - remove the properties in code
> > > > > *When*:
> > > > > - readme.io update with deprecation announcement for 2.5
> > > > > - changing defaults + @Deprecated in code + Javadoc update +
> > respective
> > > > > readme.io rewording for 2.6
> > > > > - properties removal in 3.0
> > > > >
> > > > > 4. Don’t touch other timeouts
> > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or
> > > TcpCommunicationSpi.
> > > > idleConnectionTimeout,
> > > > > are orthogonal to the whole
> > > > > “network stability” theme discussed above, and don’t have to
be
> > > changed.
> > > > >
> > > > > Finally, I’ve prepared a draft of the docs page that may be used
> as a
> > > > base
> > > > > for the readme.io update.
> > > > > This email is pretty long already, so please find the draft
> attached
> > to
> > > > > the JIRA issue
> > > > > https://issues.apache.org/jira/browse/IGNITE-7704.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > > > From: Alexey Popov
> > > > > Sent: 1 марта 2018 г. 17:01
> > > > > To: dev@ignite.apache.org
> > > > > Subject: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi
> > > > timeouts
> > > > >
> > > > > Hi Igniters,
> > > > >
> > > > > We often see similar questions from users and customers related to
> > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts
> > and
> > > > > their
> > > > > relations. And we see several side-effects after incorrect timeout
> > > > > configuration.
> > > > >
> > > > > I tried to briefly describe these timeout settings (please see
> below)
> > > and
> > > > > found out that the most of them do not have sense in terms of
> cluster
> > > > > functions/operations and could not be explained to the users.
> > > > >
> > > > > I propose to deprecate most of them and leave only the timeouts we
> > can
> > > > > explain in common terms ( (setFailureDetectionTimeout,
> > > setNetworkTimeout,
> > > > > setJoinTimeout and some others).
> > > > >
> > > > > Please let me know your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Alexey
> > > > >
> > > > > GLOBAL:
> > > > >
> > > > > IgniteConfiguration.setNetworkTimeout:
> > > > > It is a global timeout for high-level operations where a network
is
> > > > > involved. For instance, IgniteMessaging delivery uses this timeout
> or
> > > > > DiscoverySpi handshake.
> > > > >
> > > > > IgniteConfiguration.setFailureDetectionTimeout:
> > > > > It is a global timeout for detecting failures at IgniteSpi
> > > > implementations
> > > > > (including DiscoverySpi and CommunicationSpi).
> > > > > The failure detection algorithm actually limits a range of simple
> > > network
> > > > > operations related to a single logical operation (for instance, a
> > > > reliable
> > > > > delivery of some DiscoverySpi message within a cluster).
> > > > > Failure detection timeout is a cumulative timeout for a socket
> > > > connection,
> > > > > sending and receiving data bytes and all possible socket retries
> (if
> > > some
> > > > > failure happens).
> > > > > This timeout is intended to simplify the failure detection
> condition
> > > > from a
> > > > > user perspective.
> > > > >
> > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a
> > > special
> > > > > case
> > > > > for DiscoverySpi client-node Ignite.
> > > > >
> > > > > TCP DISCOVERY SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpDiscoverySpi you can explicitly use the following low-level
> > options
> > > > > (that
> > > > > will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout
> > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts
> > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The
> write
> > > > > operation will be repeated getReconnectCount() times if it exceeds
> > this
> > > > > timeout
> > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout.
> > If a
> > > > > message acknowledgment is not received within this timeout, sending
> > is
> > > > > considered as failed and SPI will try to repeat send operation. It
> is
> > > > > automatically doubled for simultaneous retries up to
> getMaxAckTimeout
> > > > > value.
> > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout,
> if
> > > the
> > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending
> > retries
> > > > >
> > > > > Another important TcpDiscoverySpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process
> > when
> > > a
> > > > > new/restarted node joins a cluster. The node tries to connect to
> all
> > > > > available IP addresses provided by ipFinder within this timeout.
> > > > > If the timeout is exceeded, the node will give up and throw an
> > > exception
> > > > > from Ignition.start().
> > > > >
> > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level
> operations
> > > > like
> > > > > handshake. It looks like it should be deprecated and the
> > > > > IgniteConfiguration.getNetworkTimeout should be used here.
> > > > >
> > > > > TCP COMMUNICATION SPI:
> > > > >
> > > > > If you need more control over failure detection algorithm for
> > > > > TcpCommunicationSpi you can explicitly use the following low-level
> > > > options
> > > > > (that will disable failureDetectoinTimeout logic):
> > > > >
> > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection
> timeout,
> > > > will
> > > > > be automatically doubled for simultaneous retries (up to
> > > > getReconnectCount)
> > > > > related to a single logical operation
> > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection
> > > > timeout,
> > > > > the higher limit of getReconnectCount-times doubled
> getConnectTimeout
> > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect
> > > attempts
> > > > > used
> > > > > when establishing connection with the remote node and sending
> > messages
> > > to
> > > > > it
> > > > >
> > > > > Another important TcpCommunicationSpi timeouts:
> > > > >
> > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message.
> > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection
> > > > timeout
> > > > > upon which a connection will be closed.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message