accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 02:35:54 GMT
On 12/31/13, 6:37 PM, Arshak Navruzyan wrote:
> Here is my route -n
>
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
> Iface
> 10.240.0.1      0.0.0.0         255.255.255.255 UH    0      0        0 eth0
> 169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
> 0.0.0.0         10.240.0.1      0.0.0.0         UG    0      0        0 eth0
>
>
> "slave tserver" is another physical machine (well google compute engine
> instance).  Yes one gce instance is running master (and slave) and the
> other is running just slave.
>
> here is my config:
>
> masters:
> 10.240.165.43
>
> slaves:
> 10.240.165.43
> 10.240.203.36
>
> BTW when I run bin/check-slaves conf/slaves
> # WRITABLE value not configured, not checking partitions
> 10.240.165.43
> 10.240.203.36
>
> Is the master supposed to be listed in the slaves files too?

No, your configuration files look correct.

I'm not sure why but for whatever reason, your slave (10.240.203.36) 
can't talk back to the master (10.240.165.43), but at least that's where 
you want to look at things. You know that the master can talk to the 
slave (otherwise the slave tserver would have never started) and that 
the slave tserver can talk to ZooKeeper (that it had and then lost a 
lock in ZK). Are you running ZooKeeper on the master (that would further 
isolate it in debugging this).

It may be worthwhile to double check your /etc/hosts entries just to be 
safe. Aside from that, I can't think of anything else at the moment.

>
> On Tue, Dec 31, 2013 at 3:32 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Maybe check the output of `route -n` on the master? It might be
>     something weird with DNS as well.
>
>     When you say "slave tserver", are you talking about a separate
>     physical machine? You have one node running the Accumulo master and
>     another running a tserver?
>
>
>     On 12/31/13, 6:02 PM, Arshak Navruzyan wrote:
>
>         I configured a new instance with a master and a slave tserver.
>           When I
>         do start-all on the master, the slave doesn't come up.  I am
>         wondering
>         if it's because I left the instance secret as the default. (I get an
>         exception when I try to change that).
>
>         This is what I see in the master's monitor regarding the slave
>
>              Non-Functioning Tablet Servers
>              The following tablet servers reported a status other than
>         Online
>
>         10.240.203.36:9997 <http://10.240.203.36:9997>
>         <http://10.240.203.36:9997>  UNRESPONSIVE
>
>
>
>         In the master log I see the following
>
>              2013-12-31 22:56:13,665 [master.Master] ERROR: unable to
>         get tablet
>              server status 10.240.203.36:9997[__1434a79d34404a2]
>              org.apache.thrift.transport.__TTransportException:
>         java.net <http://java.net>.__NoRouteToHostException: No route to
>         host
>              2013-12-31 22:56:13,712 [master.Master] ERROR: unable to
>         get tablet
>              server status 10.240.203.36:9997[__1434a79d34404a2]
>              org.apache.thrift.transport.__TTransportException:
>         java.net <http://java.net>.__NoRouteToHostException: No route to
>         host
>              2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO :
>         Loaded
>              class
>         org.apache.accumulo.server.__master.balancer.__DefaultLoadBalancer
>              for table !0
>              2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1
>         tablets
>              2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
>              table state for store Root Tablet
>              org.apache.thrift.transport.__TTransportException:
>         java.net <http://java.net>.__NoRouteToHostException: No route to
>         host
>                       at
>
>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__createNewTransport(__ThriftTransportPool.java:475)
>                       at
>
>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:464)
>                       at
>
>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:441)
>                       at
>
>         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)
>
>
>
>         In the slave's tserver.log all I see is
>
>              2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
>              tablet server lock (reason = LOCK_DELETED), exiting.
>
>

Mime
View raw message