Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AD18B200CD9 for ; Thu, 3 Aug 2017 15:47:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ABC8116B940; Thu, 3 Aug 2017 13:47:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CADEE16B93F for ; Thu, 3 Aug 2017 15:47:10 +0200 (CEST) Received: (qmail 23684 invoked by uid 500); 3 Aug 2017 13:47:10 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 23672 invoked by uid 99); 3 Aug 2017 13:47:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Aug 2017 13:47:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 38CEE180312 for ; Thu, 3 Aug 2017 13:47:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.379 X-Spam-Level: *** X-Spam-Status: No, score=3.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id zscpSX6Bei1S for ; Thu, 3 Aug 2017 13:47:07 +0000 (UTC) Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 545395F242 for ; Thu, 3 Aug 2017 13:47:07 +0000 (UTC) Received: by mail-wm0-f42.google.com with SMTP id t201so15816091wmt.1 for ; Thu, 03 Aug 2017 06:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=kthCcr1Du7EYeC1ccp9fAfQHGmireHY/i3obzj0ttQw=; b=MGYS5qaL21WTxikGXE30NK5J5v1K2/x/kAI09gYt15KwQKVAk0TpWD58DEdmTNgDzl 96N3sfqR8ZwGcGzg7R3YGyTjF1xT/r0979xbnMwMBQEVnyuBWmnxDlMYDM3apRdZWcSR Zs31nBlRbL0rEbOMpA8e77CtHE4meRYrpoUAMw1ovjwX51JWg80rKvYtPmF8NYMs5u4r 0CDqKSXOcqa96NQDRuSyGvRqROe6DscdrsCTwSALD/e+40fNO9TE/g+rpnfDr6xBK8Ld T+tu39FB3KK8fIwcRgZvNNNC3yHaL0ante4dL9J7uK/rP04LAzJGIb/vKcEwGgGAKsBQ Pq5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=kthCcr1Du7EYeC1ccp9fAfQHGmireHY/i3obzj0ttQw=; b=DA7bQujOs21P31bnR5B3mzSB9BZFClGQwb2QxaTxFzGBjkskkAk2Hr7kPeftKfC7g9 aOjHun+ET7TyBLJTT7VmASQGdSjk7NdzOqJ1Hc2/i9sc0f+BpECK5EUoVf3t8s/hvDBY mY9B0QNyKwnjfAaIB7qdYKMJZ3bZLiAzXW/k8rT87k4juLcAe5CifVLw37UMQBluR0NQ sUuNVmfHKcAc4OJrzqx9y+zLvInXtIxwPOdFcDDIm11+aH7Uoda8mmWWx5Yg1vj2Fsro cYqIOKhkGXuUMq6VyV76wW/hRXNQ3LY6eRtC3GCpeJEx75vAMAhVlu6j3LoB1/v1gU6C Af5w== X-Gm-Message-State: AIVw110lbm+LWyMy1pgiESj50z8zEVw7wuop109rVpNbh1309IEjixZY P3gW1hd+8z4er8MQVEDHknv8dfU76Q== X-Received: by 10.80.224.200 with SMTP id j8mr1978609edl.230.1501768026177; Thu, 03 Aug 2017 06:47:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.154.163 with HTTP; Thu, 3 Aug 2017 06:46:25 -0700 (PDT) In-Reply-To: References: <563bee65-0268-4d50-bb51-b22ba42838f0@apache.org> <003f9622-c5dd-46fd-8f87-a61c631a8375@apache.org> From: Sergey Chugunov Date: Thu, 3 Aug 2017 16:46:25 +0300 Message-ID: Subject: Re: Cluster auto activation design proposal To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="089e0822159cbc46660555d99d60" archived-at: Thu, 03 Aug 2017 13:47:11 -0000 --089e0822159cbc46660555d99d60 Content-Type: text/plain; charset="UTF-8" I also would like to provide more use cases of how BLT is supposed to work (let me call it this way until we come up with a better one): 1. User creates new BLT using WebConsole or other tool and "applies" it to brand-new cluster. 2. User starts up brand-new cluster with desired amount of nodes and activates it. At the moment of activation BLT is created with all server non-daemon nodes presented in the cluster. 3. User starts up a cluster with previously prepared BLT -> when set of nodes in the cluster matches with BLT cluster gets automatically activated. 4. User has an up-and-running active cluster and starts a few more nodes. They join the cluster but no partitions are assigned to them. User recreates BLT on new cluster topology -> partitions are assigned to new nodes. 5. User takes out nodes from cluster (e.g. for maintenance purposes): no rebalance happens until user recreates BLT on new cluster topology. 6. If some parameters reach critical levels (e.g. number of backups for a partition is too low) coordinator automatically recreates BLT and thus triggers rebalancing. I hope these use cases will help to clarify purposes of the proposed feature. On Thu, Aug 3, 2017 at 4:08 PM, Alexey Goncharuk wrote: > My understanding of Baseline Topology is the set of nodes which are > *expected* to be in the cluster. > Let me go a little bit further because BT (or whatever name we choose) may > and will solve more issues than just auto-activation: > > 1) More graceful control over rebalancing than just rebalance delay. If a > server is shut down for maintenance and there are enough backup nodes in > the cluster, there is no need to rebalance. > 2) Guarantee that there will be no conflicting key-value mappings due to > incorrect cluster activation. For example, consider a scenario when there > was a cluster of 10 nodes, then the cluster was shut down, started first 5 > nodes, activated, made some updates, shut down 5 nodes, start up other 5 > nodes, activate, make some updates, start up first 5 nodes. Currently, > there is no way to determine that there was an incompatible topology change > which leads to data inconsistency. > 3) When a cluster is shutting down node-by-node, we must track a node which > has 'seen' a partition last time and not activate the cluster until all > nodes are present. Otherwise, again, we may activate too early and see > outdated values. > > I do not want to add any 'faster' hacks here because they will only make > the issue above appear more likely. Besides, BT should be available in 2.2 > anyway, so no need to rush with hacks. > > --AG > > 2017-08-03 15:09 GMT+03:00 Yakov Zhdanov : > > > >Obvious connotation of "minimal set" is a set that cannot be decreased. > > > > >But lets consider the following case: user has a cluster of 50 nodes and > > >decides to switch off 3 nodes for maintenance for a while. Ok, user just > > >does it and then recreates this "minimal node set" to only 47 nodes. > > > > >So initial minimal node set was decreased - something counter-intuitive > to > > >me and may cause confusion as well. > > > > That was my point. If I have 50 nodes and 3 backups I can restart on 48, > 49 > > and 50 without data loss. In case of 48 and 49 after cluster gets > activated > > missing backups are assigned and rebalancing starts. > > > > --Yakov > > > --089e0822159cbc46660555d99d60--