accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 03:13:45 GMT
Josh,

Yea Zookeeper is running on the master and I can connect to it using zkCli
from the slave.

/etc/hosts looks fine

127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4
::1         localhost localhost.localdomain localhost6
localhost6.localdomain6
10.240.203.36 shoki.c.accumulo-test.internal shoki  # Added by Google

Hmm, completely baffled!

Arshak


On Tue, Dec 31, 2013 at 6:35 PM, Josh Elser <josh.elser@gmail.com> wrote:

> On 12/31/13, 6:37 PM, Arshak Navruzyan wrote:
>
>> Here is my route -n
>>
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
>> Iface
>> 10.240.0.1      0.0.0.0         255.255.255.255 UH    0      0        0
>> eth0
>> 169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0
>> eth0
>> 0.0.0.0         10.240.0.1      0.0.0.0         UG    0      0        0
>> eth0
>>
>>
>> "slave tserver" is another physical machine (well google compute engine
>> instance).  Yes one gce instance is running master (and slave) and the
>> other is running just slave.
>>
>> here is my config:
>>
>> masters:
>> 10.240.165.43
>>
>> slaves:
>> 10.240.165.43
>> 10.240.203.36
>>
>> BTW when I run bin/check-slaves conf/slaves
>> # WRITABLE value not configured, not checking partitions
>> 10.240.165.43
>> 10.240.203.36
>>
>> Is the master supposed to be listed in the slaves files too?
>>
>
> No, your configuration files look correct.
>
> I'm not sure why but for whatever reason, your slave (10.240.203.36) can't
> talk back to the master (10.240.165.43), but at least that's where you want
> to look at things. You know that the master can talk to the slave
> (otherwise the slave tserver would have never started) and that the slave
> tserver can talk to ZooKeeper (that it had and then lost a lock in ZK). Are
> you running ZooKeeper on the master (that would further isolate it in
> debugging this).
>
> It may be worthwhile to double check your /etc/hosts entries just to be
> safe. Aside from that, I can't think of anything else at the moment.
>
>
>> On Tue, Dec 31, 2013 at 3:32 PM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>>     Maybe check the output of `route -n` on the master? It might be
>>     something weird with DNS as well.
>>
>>     When you say "slave tserver", are you talking about a separate
>>     physical machine? You have one node running the Accumulo master and
>>     another running a tserver?
>>
>>
>>     On 12/31/13, 6:02 PM, Arshak Navruzyan wrote:
>>
>>         I configured a new instance with a master and a slave tserver.
>>           When I
>>         do start-all on the master, the slave doesn't come up.  I am
>>         wondering
>>         if it's because I left the instance secret as the default. (I get
>> an
>>         exception when I try to change that).
>>
>>         This is what I see in the master's monitor regarding the slave
>>
>>              Non-Functioning Tablet Servers
>>              The following tablet servers reported a status other than
>>         Online
>>
>>         10.240.203.36:9997 <http://10.240.203.36:9997>
>>         <http://10.240.203.36:9997>  UNRESPONSIVE
>>
>>
>>
>>         In the master log I see the following
>>
>>              2013-12-31 22:56:13,665 [master.Master] ERROR: unable to
>>         get tablet
>>              server status 10.240.203.36:9997[__1434a79d34404a2]
>>              org.apache.thrift.transport.__TTransportException:
>>         java.net <http://java.net>.__NoRouteToHostException: No route to
>>
>>         host
>>              2013-12-31 22:56:13,712 [master.Master] ERROR: unable to
>>         get tablet
>>              server status 10.240.203.36:9997[__1434a79d34404a2]
>>              org.apache.thrift.transport.__TTransportException:
>>         java.net <http://java.net>.__NoRouteToHostException: No route to
>>
>>         host
>>              2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO :
>>         Loaded
>>              class
>>         org.apache.accumulo.server.__master.balancer.__
>> DefaultLoadBalancer
>>
>>              for table !0
>>              2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1
>>         tablets
>>              2013-12-31 22:56:13,812 [master.Master] ERROR: Error
>> processing
>>              table state for store Root Tablet
>>              org.apache.thrift.transport.__TTransportException:
>>         java.net <http://java.net>.__NoRouteToHostException: No route to
>>         host
>>                       at
>>
>>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__
>> createNewTransport(__ThriftTransportPool.java:475)
>>                       at
>>
>>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__
>> getTransport(__ThriftTransportPool.java:464)
>>                       at
>>
>>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__
>> getTransport(__ThriftTransportPool.java:441)
>>                       at
>>
>>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__
>> getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)
>>
>>
>>
>>
>>         In the slave's tserver.log all I see is
>>
>>              2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL:
>> Lost
>>              tablet server lock (reason = LOCK_DELETED), exiting.
>>
>>
>>

Mime
View raw message