accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Thornton <dthorn...@data-tactics.com>
Subject RE: Zookeeper Implementation
Date Tue, 16 Jul 2013 13:23:03 GMT
Thank you, but that is not the situation.

If one zookeeper node is shutdown/fails/whatever and the rest of the ensemble stays up, the
tablet servers attached as clients to the shutdown node immediately fail. If one of the clients
happens to be the master, the cluster goes down.

Accumulo does not seem to be failing over to the remaining zookeeper nodes, and this causes
me to restart the individual tablet servers again.

The zookeeper ensemble is very stable and has plenty of bandwidth/memory/processing, so taking
one node down out of five doesn't crash the zookeepers, just the tablet servers...



Drew Thornton
Data Tactics Corporation
dthornton@data-tactics.com
571.297.2173 (w)
804.615.0771 (m)

-----Original Message-----
From: webmaster@webmaster.ms [mailto:webmaster@webmaster.ms] On Behalf Of Denis
Sent: Monday, July 15, 2013 3:56 PM
To: user@accumulo.apache.org
Subject: Re: Zookeeper Implementation

Hi

I have seen this behavior (with Accumulo 1.4.4 though) when one of Zookeeper nodes being restarted,
then, after few seconds delay, another node being restarted.

I did not investigate the issue, but it seems that if you want to change Zookeeper configuration
and restart all nodes, you have to wait few minutes between restarts.

On 7/15/13, Drew Thornton <dthornton@data-tactics.com> wrote:
> Yes, [ maxClientCnxns=100 ]. I've used full hostnames and ports as 
> well in Accumulo-site.
>
> I noticed the pattern of crashes when I was testing Zookeeper's JVM 
> garbage collector settings. I would take one node out at a time to 
> restart its JVM, and individual Tablet Servers (and eventually the 
> master) would crash depending on the Zookeeper node that I took down.
>
> Drew
>
> From: Eric Newton [mailto:eric.newton@gmail.com]
> Sent: Monday, July 15, 2013 2:31 PM
> To: user@accumulo.apache.org
> Subject: Re: Zookeeper Implementation
>
> You are giving the names of all the zookeeper nodes in 
> accumulo-site.xml, right?
>
>   <property>
>     <name>instance.zookeeper.host</name>
>     <value>zoo1,zoo2,zoo3,zoo4,zoo5</value>
>   </property>
>
> Have you increased maxClientCnxns as described in the accumulo README?
>
> -Eric
>
>
> On Mon, Jul 15, 2013 at 2:04 PM, Drew Thornton 
> <dthornton@data-tactics.com<mailto:dthornton@data-tactics.com>> wrote:
> Hello,
>
> I'm running a small cluster of 10 tablet servers and 5 zookeeper nodes 
> (CDH 4.3, Zookeeper 3.4.5, Accumulo 1.5.0).
>
> I have noticed that when a zookeeper node dies, the connected tablet 
> server clients also die instead of failing-over to another zookeeper. 
> If the clients on the failed zookeeper are only tablet servers, 
> Accumulo reassigns the tablets. If the Accumulo Master is one of the 
> clients on the failed node, then the master goes down and the cluster with it.
>
> Anyone else have this problem or know of a workaround/solution to keep 
> the cluster up when zookeeper changes state?
>
> Thanks,
> Drew
>
>
>
>
Mime
View raw message