zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabel Muñoz Fernández <imu...@etsisi.upm.es>
Subject Re: Tracking down possible network partition
Date Fri, 25 Sep 2015 10:16:08 GMT
Sergio:

MIra  en el foro de Zookeeper cómo se refieren a nodos de ZK en diifrentes continentes y
a los problemas de particiones. Parece que hay una figura (observers de Curator) que habrá
que estudiar. 

> On 24 Sep 2015, at 21:32, Bob Sheehan <bsheehan@vmware.com> wrote:
> 
> We have similar issue:
> 
> 3 node ZK cluster in DC1 (e.g. Las Vegas) .. quorum of 2. Each node on Vmware ESXI host
in same rack.
> 
> 2 observer ZK nodes in DC2 (e.g Germany).Each node on Vmware ESXI host in same rack.
> 
> Centos 6
> ZK version Cloudera cdh 3.4.5.
> 
> 
>  *   Looks like leader election in DC1 is taking a while ~15 minutes. At some point TCP
connection to one of three nodes is lost. Eventualy repairs.
> 
> 
>  *   Apparently during leader election connection lost to observers for ~15 minutes...
then connection repaired. But we have 15 minute window where both observers (DC2) cannot communicate
with ZK cluster (DC1). Our DC2 clients are comuunicating to observers using apache curator
library. This causes our API to fail as it needs ZK data.
> 
> We used netstat on TCP ports and are seeing non 0 SENDQ size.
> 
> 
> Is there any know fix/patch for this ? Suggestions welcome.
> 
> Thanks,
> 
> Bob


Mime
View raw message