Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B0818200CB1 for ; Sat, 24 Jun 2017 17:14:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AED06160BE6; Sat, 24 Jun 2017 15:14:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 017C5160BDA for ; Sat, 24 Jun 2017 17:14:47 +0200 (CEST) Received: (qmail 23939 invoked by uid 500); 24 Jun 2017 15:14:46 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 23927 invoked by uid 99); 24 Jun 2017 15:14:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jun 2017 15:14:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 40FB6C05B0 for ; Sat, 24 Jun 2017 15:14:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.397 X-Spam-Level: X-Spam-Status: No, score=0.397 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_NUMSUBJECT=0.5, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=elyograg.org Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id uGYoaM0NGXTn for ; Sat, 24 Jun 2017 15:14:39 +0000 (UTC) Received: from frodo.elyograg.org (frodo.elyograg.org [166.70.79.219]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0E38B5FC93 for ; Sat, 24 Jun 2017 15:14:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id 7416896D for ; Sat, 24 Jun 2017 09:14:28 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-language:content-transfer-encoding:content-type :content-type:in-reply-to:mime-version:user-agent:date:date :message-id:from:from:references:subject:subject:received :received; s=mail; t=1498317268; bh=hLjjca+yc6PYTJnZQgXfRvNAPpje oEfzy3DRQ24c0MM=; b=bPAXyoGRRQ+U8wIRjEOvdnjrB55S8q/5kuHZ3Bzbm9w/ qmJh6O6Jfh8jOgGyOAgv+AwVp4H/726A9PprZH7WccadbNAEnLi0x4e/Icf3XyKA dmoBUMJBfcAL6Xy5SYLqPQi0FmPWF1HFp48zT0Na1vXWdzi69LJcDbKEOnDuejM= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id nbuBZunMXfs1 for ; Sat, 24 Jun 2017 09:14:28 -0600 (MDT) Received: from [192.168.1.111] (111.int.elyograg.org [192.168.1.111]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id 2995A919 for ; Sat, 24 Jun 2017 09:14:28 -0600 (MDT) Subject: Re: Cross DC SolrCloud anti-patterns in presentation shalinmangar/cross-datacenter-replication-in-apache-solr-6 To: solr-user@lucene.apache.org References: From: Shawn Heisey Message-ID: <6eb114f7-66c7-df6c-c902-fea0a7e1ab03@elyograg.org> Date: Sat, 24 Jun 2017 09:14:28 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US archived-at: Sat, 24 Jun 2017 15:14:48 -0000 On 6/24/2017 2:14 AM, Arcadius Ahouansou wrote: > Interpretation 1: > > - On slide 6 and 7: Only 2 DC used, so the ZK quorum will not survive and recover after 1 DC failure > > - On slide 8: We have 3 DCs which OK for ZK. > But we have 6 ZK nodes. > This is a problem because ZK likes 3, 5, 7 ... odd nodes. On both slide 6 and slide 7, Solr stays completely operational in DC1 if DC2 goes down. It all falls apart if DC1 goes down. For clients that can still reach them, the remaining Solr servers are read only in that situation. Slide 8 is very similar -- if DC1 goes down, Solr is read only. If either DC2 or DC3 goes down, everything is fine for clients that can still get to Solr. One additional consideration: If both DC2 and DC3 go down, then the remaining Solr severs in DC1 are read only. ZooKeeper doesn't *need* an odd number of servers, but there's no benefit to an even number. If you have 5 servers, two can go down. If you have 6 servers, you can still only lose two, so you might as well just run 5. You'd have fewer possible points of failure, less power usage, and less bandwidth usage. The best minimum option is an odd number of data centers, minimum 3, with one zookeeper in each location. For Solr, you want at least two servers, which should be split evenly between at least two of those datacenter locations. If you're really stuck with only two datacenters, then you can follow the advice in the presentation: Set up a full cloud in each datacenter and use CDCR between them. > Interpretation 2: > > Any SolrCloud deployment with "Remote SolrCloud nodes" i.e. solrCloud not in same DC as ZK is deemed an anti-pattern (note that DCs can be just a couple of miles apart and could be connected by high speed network) I'm not sure that this is actually true, but it does introduce latency and more moving parts in the form of network connections between data centers -- connections which might go down. I wouldn't do it, but I also wouldn't automatically dismiss it as a viable setup, as long as it meets ZooKeeper's requirements and there are two complete copies of the Solr collections, each in different data centers. Typical designs only stay viable if one datacenter goes down, but if you were to use five datacenters and have enough Solr servers for three complete copies of your collections, you could survive two data center outages. Thanks, Shawn