ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yakov Zhdanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-752) Speed up failure detection
Date Thu, 23 Apr 2015 10:26:38 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508797#comment-14508797
] 

Yakov Zhdanov commented on IGNITE-752:
--------------------------------------

Also consider this http://stackoverflow.com/questions/26134208/gridgain-node-disconnect

> Speed up failure detection
> --------------------------
>
>                 Key: IGNITE-752
>                 URL: https://issues.apache.org/jira/browse/IGNITE-752
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Yakov Zhdanov
>            Assignee: Yakov Zhdanov
>            Priority: Critical
>             Fix For: sprint-4
>
>
> I think we can (1) make grid configuration significantly easier and (2) speed up failure
detection.
> Here are disco SPI configuration properties which are responsible for failure detection:
> # reconnectCount,
> # sockTimeout,
> # networkTImeout, 
> # ackTImeout, 
> # maxAckTimeout,
> # heartbeatFrequency 
> # maxMissedHearbeats
> Same for communication SPI
> # reconnectCount, 
> # maxConnTimeout, 
> # connTimeout
> So, we have 10 or even more properties.
> We did it to address half-opened sockets problem (which is pretty common for cloud environment)
and GC pauses which may happen on cluster nodes - we can increase ack timeouts to prevent
them from being kicked off the topology.
> By setting value for these props I set timeout for failure detection. Why do we need
such great number of parameters instead of having 1 on IgniteConfiguration - nodeResponseThreshold
(or failureDetectionThreshold - can anyone propose better name?).
> All other parameters will be calculated automatically (I think user can still set some
of them for full control over situation - need to decide if this is needed.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message