ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@gridgain.com>
Subject Re: Stopped working on IGNITE-752 (speed up failure detection)
Date Mon, 27 Jul 2015 14:07:27 GMT
Sorry, forgot that attaches are not allowed.

Attach to a public URL mapping:
1) ignite-results-failure-detection.zip -> https://goo.gl/5mitfS
2) ignite-results-no-failure-detection-explicit-timeouts.zip -> 
https://goo.gl/as4qph
3) ignite-results-1.3.0.zip -> https://goo.gl/m8lbiR

--
Denis

On 7/27/2015 4:54 PM, Denis Magda wrote:
> Dmitriy, Igniters,
>
> I've got the first yardstick benchmarking results on Amazon EC2. 
> Thanks Nikolay for guidance and ready to use yardstick docker image.
>
> Used configuration is the following - c4.xlarge, 5 server nodes, 1 
> backup, running put/get benchmark, manually stopping one instance 
> during the execution.
> Time to warmup 60 seconds, execution time 150 seconds, 64 threads.
>
> 1) Failure detection timeout set to *300 ms.
> *Unfortunately, a drop during a kill of one server nodes is 
> significant. Please see a resulting plot in 
> ignite-results-failure-detection.zip.
>
> Making the timeout lower doesn't improve the situation.
>
> Right after that I've decided to run the same benchmark with failure 
> detection timeout ignored by setting several network related timeouts 
> explicitly (these timeouts were used before when we got insignificant 
> drop).
> TcpCommunicationSpi.setSocketWriteTimeout(200)
> TcpDiscoverySpi.setAckTimeout(50)
> TcpDiscoverySpi.setNetworkTimeout(5000)
> TcpDiscoverySpi.setHeartbeatFrequency(100)
>
> 2) Explicitly set the timeouts above, run against the latest changes 
> including mine.
> Here I saw pretty the same result - the drop is again signification. 
> Have a look at the plot in 
> ignite-results-no-failure-detection-explicit-timeouts.zip.
>
> 3) Well, the final sanity check was done over the latest release - 
> ignite-1.3.0-incubation that does NOT contain my changes. The timeouts 
> were the same as in 2).
> Unfortunately, here I see the same drop as well. Look into 
> ignite-results-1.3.0.zip.
>
> Seems that we got that drop even before my 'failure detection timeout' 
> changes were merged, if refer to 3). Will try to debug all this stuff 
> better tomorrow.
>
> --
> Denis
>
> On 7/24/2015 7:15 PM, Dmitriy Setrakyan wrote:
>> Thanks Denis!
>>
>> This feature significantly simplifies failure detection configuration in
>> Ignite - just one configuration flag now vs. don't even remember how many.
>>
>> Have you run a yardstick test on Amazon EC2 with this new configuration
>> flag? If we kill a node in the middle, then drop should be insignificant.
>>
>> Also, I want to note your excellent handling of Jira communication. The
>> ticket has been thoroughly updated every step of the way.
>>
>> D.
>>
>> On Fri, Jul 24, 2015 at 5:37 AM, Denis Magda<dmagda@gridgain.com>  wrote:
>>
>>> Igniters,
>>>
>>> Have just back merged the changes into the main development branch. Thanks
>>> Yakov and Dmitriy for spending your time on review!
>>>
>>>  From now it’s possible to detect failures at cluster nodes'
>>> discovery/communication/network levels by altering a single parameter -
>>> IgniteConfiguration.failureDetectionTimeout.
>>>
>>> By setting the failure detection timeout for a server node it will be
>>> possible to detect failed nodes in a cluster topology during the time equal
>>> to timeout's value and switch to/keep working with only alive nodes.
>>> By setting the timeout for a client node will let us to detect failures
>>> between the client and its router node (a server node that is a part of a
>>> topology).
>>>
>>> In addition, bunch of other improvements and simplifications were done at
>>> the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are
>>> aggregated here:
>>> https://issues.apache.org/jira/browse/IGNITE-752  < https://issues.apache.org/jira/browse/IGNITE-752>
>>>
>>> —
>>> Denis
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message