zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Keeney <...@fitterweb.com>
Subject Re: Ensemble fails when one node looses connectivity
Date Fri, 02 Mar 2018 02:13:02 GMT
Thanks, Yes, I have about 2MB stored in the configurations folders. I will
increase the jute.maxbuffer and see if that helps.

Jim K.

On Thu, Mar 1, 2018 at 8:58 PM, Steph van Schalkwyk <svanschalkwyk@gmail.com
> wrote:

> Does the log say anything about timing out on init?
> Your initLimit is already pretty big, but then we don't know anything about
> your setup.
> Are you storing more than 1MB in a znode? Then increase jute.maxbuffer (in
> java.env as a -Djute.maxbuffer=xxxxxx).
> I've recently run into that with Fusion 3.1.
> Post more details, if you would.
> Good luck.
> Steph
>
>
> On Thu, Mar 1, 2018 at 7:43 PM, Jim Keeney <jim@fitterweb.com> wrote:
>
> > I'm using Zookeeper with solr to create a cluster and I have come across
> > what seems like an unexpected behavior. The cluster is setup on AWS using
> > opsworks.  I am using a 3 node zookeeper ensemble. The zookeeper config
> > on all three nodes is:
> >
> > clientPort=2181
> >
> > dataDir=/var/opt/zookeeper/data
> >
> > tickTime=2000
> >
> > autopurge.purgeInterval=24
> >
> > initLimit=100
> >
> > syncLimit=5
> >
> > server.1=172.31.86.130:2888:3888
> >
> > server.2=172.31.16.234:2888:3888
> >
> > server.3=172.31.73.122:2888:3888
> >
> >
> > Here is the issue:
> >
> > If one node in the ensemble fails or is shut down the ensemble carries
> on.
> > However, when the node is restarted it's attempt to connect to the other
> > members of the cluster are rejected. The only way that I have found to
> > restore the ensemble is to restart all of the nodes within a short time
> > span of each other.
> >
> > If I do that they are able to discover each other  carry on a proper
> > leader election and restore order.
> >
> > Once they are restored everything is fine but if one of the nodes goes
> > down we are faced wit the same problem.
> >
> > How do I ensure that if a node goes down, it can restart and rejoin the
> > ensemble with out having to manually restart all the other nodes?
> >
> > Any help appreciated.
> >
> > Thanks.
> >
> > Jim K.
> >
> >
> >
> >
> > --
> > Jim Keeney
> > President, FitterWeb
> > E: jim@fitterweb.com
> > M: 703-568-5887 <(703)%20568-5887>
> >
> > *FitterWeb Consulting*
> > *Are you lean and agile enough? *
> >
>



-- 
Jim Keeney
President, FitterWeb
E: jim@fitterweb.com
M: 703-568-5887 <(703)%20568-5887>

*FitterWeb Consulting*
*Are you lean and agile enough? *

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message