Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C56C6200CB5 for ; Tue, 27 Jun 2017 10:18:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C428E160BDC; Tue, 27 Jun 2017 08:18:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 149A7160BE9 for ; Tue, 27 Jun 2017 10:18:03 +0200 (CEST) Received: (qmail 3350 invoked by uid 500); 27 Jun 2017 08:18:02 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 3338 invoked by uid 99); 27 Jun 2017 08:18:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jun 2017 08:18:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 15281188A99 for ; Tue, 27 Jun 2017 08:18:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Uaxn8ewn7ALT for ; Tue, 27 Jun 2017 08:18:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 16FB95FAFA for ; Tue, 27 Jun 2017 08:18:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9B5BDE0D54 for ; Tue, 27 Jun 2017 08:18:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 131B024128 for ; Tue, 27 Jun 2017 08:18:00 +0000 (UTC) Date: Tue, 27 Jun 2017 08:18:00 +0000 (UTC) From: "Alexey Goncharuk (JIRA)" To: dev@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (IGNITE-5593) Affinity change message leak on massive topology updates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 27 Jun 2017 08:18:04 -0000 Alexey Goncharuk created IGNITE-5593: ---------------------------------------- Summary: Affinity change message leak on massive topology updates Key: IGNITE-5593 URL: https://issues.apache.org/jira/browse/IGNITE-5593 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 1.7 Reporter: Alexey Goncharuk Priority: Critical Fix For: 2.1 When late affinity assignment is enabled, we complete the exchange future with custom discovery event. Since discovery topology events usually are much faster than exchange futures completion, it is possible that a newly joined node can 'see' the affinity change messages that are related to previous topology versions when this node even was not present in the topology. When this message is received, an exchange future is created and this message is added to discoEvts list. However, this future never completes on this node because init() is never called. This means that this exchange future sits in the exchange set with the affinity change message. Since the number of topology changes (and, thus, messages) can be quite large, this leads to excessive memory consumption on the starting node. I've observed ~3Gb of heap wasted on one of the nodes when > 200 nodes were restarted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)