accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Zookeeper Implementation
Date Tue, 16 Jul 2013 13:39:13 GMT
Confirmed.  See
ACCUMULO-1572<https://issues.apache.org/jira/browse/ACCUMULO-1572>
.

-Eric


On Tue, Jul 16, 2013 at 9:23 AM, Drew Thornton
<dthornton@data-tactics.com>wrote:

> Thank you, but that is not the situation.
>
> If one zookeeper node is shutdown/fails/whatever and the rest of the
> ensemble stays up, the tablet servers attached as clients to the shutdown
> node immediately fail. If one of the clients happens to be the master, the
> cluster goes down.
>
> Accumulo does not seem to be failing over to the remaining zookeeper
> nodes, and this causes me to restart the individual tablet servers again.
>
> The zookeeper ensemble is very stable and has plenty of
> bandwidth/memory/processing, so taking one node down out of five doesn't
> crash the zookeepers, just the tablet servers...
>
>
>
> Drew Thornton
> Data Tactics Corporation
> dthornton@data-tactics.com
> 571.297.2173 (w)
> 804.615.0771 (m)
>
> -----Original Message-----
> From: webmaster@webmaster.ms [mailto:webmaster@webmaster.ms] On Behalf Of
> Denis
> Sent: Monday, July 15, 2013 3:56 PM
> To: user@accumulo.apache.org
> Subject: Re: Zookeeper Implementation
>
> Hi
>
> I have seen this behavior (with Accumulo 1.4.4 though) when one of
> Zookeeper nodes being restarted, then, after few seconds delay, another
> node being restarted.
>
> I did not investigate the issue, but it seems that if you want to change
> Zookeeper configuration and restart all nodes, you have to wait few minutes
> between restarts.
>
> On 7/15/13, Drew Thornton <dthornton@data-tactics.com> wrote:
> > Yes, [ maxClientCnxns=100 ]. I've used full hostnames and ports as
> > well in Accumulo-site.
> >
> > I noticed the pattern of crashes when I was testing Zookeeper's JVM
> > garbage collector settings. I would take one node out at a time to
> > restart its JVM, and individual Tablet Servers (and eventually the
> > master) would crash depending on the Zookeeper node that I took down.
> >
> > Drew
> >
> > From: Eric Newton [mailto:eric.newton@gmail.com]
> > Sent: Monday, July 15, 2013 2:31 PM
> > To: user@accumulo.apache.org
> > Subject: Re: Zookeeper Implementation
> >
> > You are giving the names of all the zookeeper nodes in
> > accumulo-site.xml, right?
> >
> >   <property>
> >     <name>instance.zookeeper.host</name>
> >     <value>zoo1,zoo2,zoo3,zoo4,zoo5</value>
> >   </property>
> >
> > Have you increased maxClientCnxns as described in the accumulo README?
> >
> > -Eric
> >
> >
> > On Mon, Jul 15, 2013 at 2:04 PM, Drew Thornton
> > <dthornton@data-tactics.com<mailto:dthornton@data-tactics.com>> wrote:
> > Hello,
> >
> > I'm running a small cluster of 10 tablet servers and 5 zookeeper nodes
> > (CDH 4.3, Zookeeper 3.4.5, Accumulo 1.5.0).
> >
> > I have noticed that when a zookeeper node dies, the connected tablet
> > server clients also die instead of failing-over to another zookeeper.
> > If the clients on the failed zookeeper are only tablet servers,
> > Accumulo reassigns the tablets. If the Accumulo Master is one of the
> > clients on the failed node, then the master goes down and the cluster
> with it.
> >
> > Anyone else have this problem or know of a workaround/solution to keep
> > the cluster up when zookeeper changes state?
> >
> > Thanks,
> > Drew
> >
> >
> >
> >
>

Mime
View raw message