zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: Yet another "two datacenter" discussion
Date Fri, 26 May 2017 15:48:45 GMT
In ZK 3.4.x if you have configuration differences amongst your instances you are susceptible
to a split brain. See this email thread, "Rolling Config Change Considered Harmful":

http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html
<http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html>

In ZK 3.5.x I'm not even sure it would work. 

-JZ

> On May 26, 2017, at 5:43 PM, Shawn Heisey <apache@elyograg.org> wrote:
> 
> I feel fairly certain that this thread willbe an annoyance.  I don't
> know enough about zookeeper to answer the questions that are being
> asked, so I apologize about needing to relay questions about ZK fault
> tolerance in two datacenters.
> 
> It seems that everyone wants to avoid the expense of a tie-breaker ZK VM
> in a third datacenter.
> 
> The scenario, which this list has seen over and over:
> 
> DC1 - three ZK servers, one or more Solr servers.
> DC2 - two ZK servers, one or more Solr servers.
> 
> I've already explained that if DC2 goes down, everything's fine, but if
> DC1 goes down, Solr goes ready-only, and there's no way to prevent that.
> 
> The conversation went further, and I'm sure you guys have seen this
> before too:  "Is there any way we can get DC2 back to operational with
> manual intervention if DC1 goes down?"  I explained that any manual
> intervention would briefly take Solr down ... at which point the
> following proposal was mentioned:
> 
> Add an observer node to DC2, and in the event DC1 goes down, run a
> script that reconfigures all the ZK servers to change the observer to a
> voting member and does rolling restarts.
> 
> Will their proposal work?  What happens when DC1 comes back online?  As
> you know, DC1 will contain a partial ensemble that still has quorum,
> about to rejoin what it THINKS is a partial ensemble *without* quorum,
> which is not what it will find.  I'm guessing that ZK assumes the
> question of who has the "real" quorum shouldn't ever need to be
> negotiated, because the rules prevent multiple partitions from gaining
> quorum.
> 
> Solr currently ships with 3.4.6, but the next version of Solr (about to
> drop any day now) will have 3.4.10.  Once 3.5 is released and Solr is
> updated to use it, does the situation I've described above change in any
> meaningful way?
> 
> Thanks,
> Shawn
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message