From user-return-19314-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu May 17 18:21:14 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 42756180634 for ; Thu, 17 May 2018 18:21:14 +0200 (CEST) Received: (qmail 87222 invoked by uid 500); 17 May 2018 16:21:13 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 87208 invoked by uid 99); 17 May 2018 16:21:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 May 2018 16:21:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 93AFA180256 for ; Thu, 17 May 2018 16:21:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 0KnEfPpXRP-A for ; Thu, 17 May 2018 16:21:11 +0000 (UTC) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com [209.85.220.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0E2895F1B9 for ; Thu, 17 May 2018 16:21:11 +0000 (UTC) Received: by mail-qk0-f171.google.com with SMTP id t129-v6so4071041qke.11 for ; Thu, 17 May 2018 09:21:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=jnoHUlroIq5ZsssQwzI0IiyNtO1hdQNEgntCIKCzSrY=; b=I1t4R8gS3XLzU0SxqITOwoW5atBGX/nwfB25hHOHr/mGewehbyFbrB4I43a3Pu5Cp4 qMcP0PGSSjQCOiqjIri1+sQ+IYp3U/j2B5uYrSwAkzBlxHWy8TaUAXEt6IeT+L0XAuKL OVwxxtVz56P7Z1MC0CerWwLRvQhLSDohreA9sX+kO2q8tT6q29ptxTaJGhliIURzX/aC 5katPITKSynLn0nRNxT2LX/a6F/MrnpVPzWG8fnaVxGVFkd9Q4yvktxgvQrtS3gwBiuw EqnAqDLzFda4Bb3EbcUP/0qanntLEjfj8tZUF2tawKVFL4t0f1OcdvpKz8EH1zac+LY5 LSKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=jnoHUlroIq5ZsssQwzI0IiyNtO1hdQNEgntCIKCzSrY=; b=CUXalkK6KRpxdP4brPX2qCheUc9IGI/zlQ9Y8Fxle80e5AcxniWqchiZaweVhRLSwi R0HJ20n+8ZZ0+y6JwCl6YP6OulMlJ64E41/QKpS3J4dzq4XolCuY11lzmCY0mKnQ+Obe NUW1RrJiFl0PoXhINfB5OedGvbRCVBihLM99hhdbbzBy3Rz7ol6Pv+UHkTp7ibhCStPk Xivv/cXSUpUzo0wT4T9RZVSekGc/xAB+ukXOB80Y7WlLAXK3ovUm3AcPFGAdFJnT9E5Y wGcuKiwZ5WRNdLh/fpwcOUh3kLU48fPweuRCtZ7lvWqGTMBO/DzC62samCD4axslRiX6 gMGA== X-Gm-Message-State: ALKqPwdl5COte78DCJDWhLC35/DMgQBP2fpuHRKJ4ONvorBU1YNhGJV3 Gfcs4XmBpQ9bHZ/Wr2AAwtIqCI2AcOs7KqkqLco= X-Google-Smtp-Source: AB8JxZq9aS7CvEgdV4DSQwBZ/CDtikbNjfrjK3+T7dvTybTtlamioY77f1M7ULkCWLptI+BS+JMJzAO6edsVK3NLn7Q= X-Received: by 2002:a37:c5c4:: with SMTP id k65-v6mr5630947qkl.158.1526574070297; Thu, 17 May 2018 09:21:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.51.120 with HTTP; Thu, 17 May 2018 09:21:09 -0700 (PDT) In-Reply-To: References: From: Ilya Kasnacheev Date: Thu, 17 May 2018 19:21:09 +0300 Message-ID: Subject: Re: Distribution of backups To: user@ignite.apache.org Content-Type: multipart/alternative; boundary="0000000000002efffc056c6939b9" --0000000000002efffc056c6939b9 Content-Type: text/plain; charset="UTF-8" Hello David! Yes, I think that should be possible to implement. However, when a node fails there would be a massive backlog of rebalancing on just two nodes, and it might cause a problem on its own. Random placement guarantees that rebalancings are placed evenly in case of node failure Regards, -- Ilya Kasnacheev 2018-05-17 19:11 GMT+03:00 David Harvey : > We built a cluster with 8 nodes using Ignite persistence and 1 backup, and > had two nodes fail at different times, the first being storage did not get > mounted and ran out of space early, and the second an SSD failed. There > are some things that we could have done better, but this event brings up > the question of how backups are distributed. > > There are two approaches that have substantially different behavior on > double faults, and double faults are more likely at scale. > > 1) random placement of backup partitions relative to primary > 2) backup partitions have similar affinity to the primary partitions, > where in the extreme nodes are paired so that primaries on the node pair > have backups on the other node of the pair > > With a 64 node cluster, #2 would have 1/63th of the likelihood of data > loss when 2 nodes fail vs #1. > > I'm guessing that ignite ships with #1, but we could provide our own > affinity function which would accomplish #2 if we chose? > > > > > *Disclaimer* > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by *Mimecast Ltd*, an innovator in Software as a > Service (SaaS) for business. Providing a *safer* and *more useful* place > for your human generated data. Specializing in; Security, archiving and > compliance. To find out more Click Here > . > --0000000000002efffc056c6939b9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello David!

Yes, I think th= at should be possible to implement. However, when a node fails there would = be a massive backlog of rebalancing on just two nodes, and it might cause a= problem on its own. Random placement guarantees that rebalancings are plac= ed evenly in case of node failure

Regards,
=

=
--
Ilya Kasnacheev

2018-05-17 19:11 GMT+03:00 David Harvey <= dharvey@jobcase.com>:
=
We built a cluster with 8 nodes using Ignite persistence a= nd 1 backup, and had two nodes fail at different times, the first being sto= rage did not get mounted and ran out of space early, and the second an SSD = failed.=C2=A0 =C2=A0There are some things that we could have done better, b= ut this event brings up the question of how backups are distributed.
There are two approaches that have substantially different beh= avior on double faults, and double faults are more likely at scale.

1) random placement of backup partitions relative to prim= ary
2) backup partitions have similar affinity to the primary par= titions, where in the extreme nodes are paired so that primaries on the nod= e pair have backups on the other node of the pair

= With a 64 node cluster, #2 would have 1/63th of the likelihood of data loss= when 2 nodes fail vs #1.

I'm guessing that ig= nite ships with #1, but we could provide our own affinity function which wo= uld accomplish #2 if we chose?=C2=A0




Di= sclaimer

The information contained in this communication from the sender is confid= ential. It is intended solely for use by the recipient and others authorize= d to receive it. If you are not the recipient, you are hereby notified that= any disclosure, copying, distribution or taking action in relation of the = contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been= automatically archived by Mimecast Ltd, an innovator in Software as= a Service (SaaS) for business. Providing a safer and more usefu= l place for your human generated data. Specializing in; Security, arch= iving and compliance. To find out more Click Here.


--0000000000002efffc056c6939b9--