accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ott, Charles H." <CHARLES.H....@saic.com>
Subject RE: Communication issue between zookeeper and accumulo
Date Tue, 06 Aug 2013 14:48:22 GMT
Last time I had errors like this, someone mentioned disabling IPv6.  I
disabled IPv6 on the all slaves, the master, and zookeeper.  After that
everything worked fine.

 

When doing junit testing with zookeeper as dependency, I noticed that
zookeeper was throwing errors in the console when my localhost was
mapped to :::1 (or whatever the IPv6 localhost default is).   The error
from zookeeper was that the 'address family not supported'.  I was able
to get around it when I switch localhost -> 127.0.0.1

 

I would try disabling ipv6 for your cluster and see if that resolves.

 

 

 

From: user-return-2830-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2830-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 10:38 AM
To: user@accumulo.apache.org
Subject: Communication issue between zookeeper and accumulo

 

 

I'm running zookeeper 1.4.3 and zookeeper 3.3.5 and I seem to have
occasional communication errors between the tablet servers and
zookeeper.  Sometimes when I restart a tablet server, I get the
following error in my log:

INFO : Waiting for tablet server lock

(repeats numerous times)

INFO:Too many retries, exiting.

 

At this point the tserver process is still running, but it registers as
dead to the master.  I have to manually terminate the tserver and then
restart it.  Usually by the second or third try, I no longer get the
"exiting" error and the server will begin to do work.  I'm running 4
tservers per machine dedicated to the tablet servers, so this makes for
a pretty "manual" method of restarting them.

 

I've looked at the code and the process is executing a Zoolock.trylock
and failing.  It then sleeps and tries again, ultimately terminating the
try lock method after 60 attempts.  I also note that Jira-954 looks
almost exactly the same, if not the same as this error.  However, it's
listed as having been fixed in 1.4.3.

 

Is there some step in configuring either zookeeper or the tsservers that
I've missed that will get rid of this?


Mime
View raw message