ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ignite TC Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS
Date Thu, 14 Feb 2019 21:43:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768740#comment-16768740

Ignite TC Bot commented on IGNITE-5569:

{panel:title=--&gt; Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3085164&amp;buildTypeId=IgniteTests24Java8_RunAll]

> TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS
> -------------------------------------------------------------------------------------
>                 Key: IGNITE-5569
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5569
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.7
>            Reporter: Alexey Goncharuk
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.8
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> A firewall configuration issue may effectively lead to a cluster DDoS. The scheme is
as follows:
> 1) A node G joins the cluster, and a firewall rule forbids incoming connection from cluster
to this node
> 2) Cluster successfully processes NodeAddedMesage and fires a discovery NODE_JOINED event
(not sure why?)
> 4) The last node in the ring fails to connect to the newly joined node and generates
> 5) Coordinator drops the connection, joining node attempts to connect again
> The issues I see here:
> 1) Neither coordinator nor joining node print out the reason why the joining node failed
/ did not join. A slight hint (failed to send message to the next node) is printed on the
node with the largest order (the one that attempted to close the ring), but the root cause
(connection refused) is also not printed
> 2) The joining node attempts to connect to the cluster with the same node ID. This violates
an invariant we heavily rely on that once a node ID leaves a cluster, this ID never comes
back again
> 3) Each discovery event leads to a partition exchange which blocks all cache operations
for a time interval equal at least to the full ring latency time. If several nodes are started
on a malicious host, this may lead to almost full cluster degradation

This message was sent by Atlassian JIRA

View raw message