From dev-return-32265-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Mon Mar 19 18:15:51 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B9176180647 for ; Mon, 19 Mar 2018 18:15:50 +0100 (CET) Received: (qmail 32166 invoked by uid 500); 19 Mar 2018 17:15:49 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 32149 invoked by uid 99); 19 Mar 2018 17:15:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2018 17:15:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 92E08C00CD for ; Mon, 19 Mar 2018 17:15:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id A-rDOB1yhP1I for ; Mon, 19 Mar 2018 17:15:46 +0000 (UTC) Received: from mail-lf0-f51.google.com (mail-lf0-f51.google.com [209.85.215.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2BD795F178 for ; Mon, 19 Mar 2018 17:15:46 +0000 (UTC) Received: by mail-lf0-f51.google.com with SMTP id y2-v6so24480761lfc.5 for ; Mon, 19 Mar 2018 10:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=ddTzZSq+xJVVgRxc48+5LA+QH0/je9UdqOPo4C3YxDk=; b=NC/oTRWznEO0Ob8bn7UWIXZc63FyC44Sjd60pjraje1EzPHTyMYmlpjF98cj+E0aE8 vmqUraSe33q9bK9NATaof7pJp7SyXkse6zj+VSy13i9LwOZjbtMjaB0cgtHCqSPX6TEc uXMxY/wMO773MfHYpNjzDgmTbEbARKu9FUvvEfomtDp9wrmnKJjQdrQMqcEc9tWrOOm0 zLA37Az37WDZE5tFKtpvd0B/bGLTTsd8MAuo+qrcW93qckXddmz7+e5rHBtZb9/j8Rsa YQIS4HBSjbjx/04qym/0dMF9gB2SM0siP3FvP8V7KrnlTWydafgL49szV8k3i8DpjSOH vAtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ddTzZSq+xJVVgRxc48+5LA+QH0/je9UdqOPo4C3YxDk=; b=KtdZ2XtANn0GdwYAQjiDhnjEcizhcyQbWO/B2/RTGv7uJ70szxwdLOhP4He1ouW2Nm BsRmrX9ae9At03t7iPezRmwvAR+r5q/lQrzP+hvfS4EMZXO+v531ey5CGiC/gzPdLjPm qcI5k328kT2Ya+8lv2s53VRq7xuDtzwBjydoh55cWUzBSDTEn+Y9bxRNodfGQ/hchAD+ BF9Y9I6dIisjmGZYkB17PWS5lebmjm6pWpPgUexSS0SHbsXYX1vycsAksWafIZ85u9hz b2v2eRKjmUkgmblbnB84CFpk8CFAB+fbhO58uJOCfNMB/UXRnzVYWsKaOU8DjcD/mwur eEFA== X-Gm-Message-State: AElRT7FsL4mLjKqCmBTjbxgJvEjkVo27CWe/wfxZNKCkS1QD7ptM/zpF BM0afarJcD2xQGNrcPVJpepSsFI9CaopXEq7rxRTyDfH X-Google-Smtp-Source: AG47ELuQ/Onfw5Q0Gi2zuj+Tpy5NzBI7TytyQU+7pC8r9cb/7GWPx/tLS0ZHVbtK/ptV9hqCryaY0wEK83Fk4BnxcZ8= X-Received: by 10.46.82.157 with SMTP id n29mr8339263lje.145.1521479744784; Mon, 19 Mar 2018 10:15:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.67.1 with HTTP; Mon, 19 Mar 2018 10:15:44 -0700 (PDT) From: Pavel Kovalenko Date: Mon, 19 Mar 2018 20:15:44 +0300 Message-ID: Subject: 2 phase waiting for partitions release To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a113cbbd6b8922a0567c71b99" --001a113cbbd6b8922a0567c71b99 Content-Type: text/plain; charset="UTF-8" Hello Igniters, Current implementation of GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't give us 100% guarantees that after this method completes there are no ongoing atomic or transactional updates on current node during main stage of PME. It gives us only guarantee that all primary updates will be finished on that node, while we can still receive and process backup updates after this method. Example of such case is described in https://issues.apache.org/jira/browse/IGNITE-7871 To avoid such situations we would like to implement second phase of waitPartitionRelease method. On this phase every server node participating in PME should wait while all other server nodes will finish their ongoing updates. Here is brief algorithm description: Non-coordinator node: 1) Finish all ongoing atomic & transactional updates. 2) Send acknowledgement to coordinator. 3) Wait for final acknowledgement from coordinator, that all nodes finished their updates. 4) Continue PME. Coordinator node: 1) Finish all ongoing atomic & transactional updates. 2) Wait for all acknowledgements from all server nodes. 3) Send final acknowledgement to all server nodes. 4) Continue PME. Acknowledgement messages have tiny size, so network pressure and overall performance drop will be minimal. Another solution of the problem is just cancelling atomic backup updates and transactional backup updates on PREPARED phase if topology version is changed. But from user perspective it's not correct to catch transaction errors even in cases when node is joining to the cluster. Any thoughts? --001a113cbbd6b8922a0567c71b99--