From issues-return-90277-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Feb 5 10:00:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4A72D180608 for ; Tue, 5 Feb 2019 11:00:12 +0100 (CET) Received: (qmail 93834 invoked by uid 500); 5 Feb 2019 10:00:11 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 93825 invoked by uid 99); 5 Feb 2019 10:00:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2019 10:00:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C690DCC85E for ; Tue, 5 Feb 2019 10:00:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id tBe4ZbjMjiyu for ; Tue, 5 Feb 2019 10:00:09 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BADB25F65D for ; Tue, 5 Feb 2019 10:00:09 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 826EEE268F for ; Tue, 5 Feb 2019 10:00:09 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 43D9B24413 for ; Tue, 5 Feb 2019 10:00:09 +0000 (UTC) Date: Tue, 5 Feb 2019 10:00:09 +0000 (UTC) From: "Alexey Goncharuk (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-5569) TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760650#comment-16760650 ] Alexey Goncharuk commented on IGNITE-5569: ------------------------------------------ I've recently stumbled upon another duplicate discovery notification case, when there were no firewall involved. Looks like a ring can "forget" about node fail event and process node join request again. I think we can introduce a limited history of ever joined nodes and forbid to join a node (send an error response for join request and drop node added message) if such a node is present in the history. > TCP Discovery SPI allows multiple NODE_JOINED / NODE_FAILED leading to a cluster DDoS > ------------------------------------------------------------------------------------- > > Key: IGNITE-5569 > URL: https://issues.apache.org/jira/browse/IGNITE-5569 > Project: Ignite > Issue Type: Bug > Components: general > Affects Versions: 1.7 > Reporter: Alexey Goncharuk > Assignee: Dmitry Karachentsev > Priority: Major > Fix For: 2.8 > > > A firewall configuration issue may effectively lead to a cluster DDoS. The scheme is as follows: > 1) A node G joins the cluster, and a firewall rule forbids incoming connection from cluster to this node > 2) Cluster successfully processes NodeAddedMesage and fires a discovery NODE_JOINED event (not sure why?) > 4) The last node in the ring fails to connect to the newly joined node and generates NODE_FAILED event > 5) Coordinator drops the connection, joining node attempts to connect again > The issues I see here: > 1) Neither coordinator nor joining node print out the reason why the joining node failed / did not join. A slight hint (failed to send message to the next node) is printed on the node with the largest order (the one that attempted to close the ring), but the root cause (connection refused) is also not printed > 2) The joining node attempts to connect to the cluster with the same node ID. This violates an invariant we heavily rely on that once a node ID leaves a cluster, this ID never comes back again > 3) Each discovery event leads to a partition exchange which blocks all cache operations for a time interval equal at least to the full ring latency time. If several nodes are started on a malicious host, this may lead to almost full cluster degradation -- This message was sent by Atlassian JIRA (v7.6.3#76005)