accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <busbey...@clouderagovt.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 07:57:19 GMT
Well, I can tell you the proximal cause.  the tserver log shows that it
starts normally, then exits because it's told to (via the zookeeper lock
being removed).

If you look at the master debug logs, this happens because the master fails
in three attempts to talk to the tserver, all with the same error:

2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get tablet server
status 10.240.203.36:9997[1434c70ed30001b]
org.apache.thrift.transport.TTransportException:
java.net.NoRouteToHostException: No route to host

Unfortunately, this is the same error you noticed in your first email.
After 3 of those, the master deletes the zk lock so that the tserver will
shutdown.

Could there be another firewall blocking access to port 9997 on the worker
machine from the master machine?

Check from the master (you'll need netcat):

$ nc -z 10.240.203.36 9997
$ echo $?





On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan <arshakn@gmail.com> wrote:

> I am probably missing something really basic so I posted both the master
> and the slave log files:
>
> https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>
> Thanks again to everyone for the help!
>
>
> On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan <arshakn@gmail.com>wrote:
>
>> disabled selinux (iptables already off) on both master and slave but
>> didn't make a difference unfortunately.
>>
>>
>>
>> On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen <hoodel@hoodel.com>wrote:
>>
>>>
>>> SELINUX disabled? IPTABLES configured? I have nothing else.
>>>
>>> Kurt
>>>
>>> ------
>>>
>>>
>>> On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
>>>
>>>> I configured a new instance with a master and a slave tserver.  When I
>>>> do start-all on the master, the slave doesn't come up.  I am wondering if
>>>> it's because I left the instance secret as the default. (I get an exception
>>>> when I try to change that).
>>>>
>>>> This is what I see in the master's monitor regarding the slave
>>>>
>>>>     Non-Functioning Tablet Servers
>>>>     The following tablet servers reported a status other than Online
>>>>
>>>> 10.240.203.36:9997 <http://10.240.203.36:9997>  UNRESPONSIVE
>>>>
>>>>
>>>>
>>>> In the master log I see the following
>>>>
>>>>     2013-12-31 22:56:13,665 [master.Master] ERROR: unable to get
>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>     org.apache.thrift.transport.TTransportException:
>>>>     java.net.NoRouteToHostException: No route to host
>>>>     2013-12-31 22:56:13,712 [master.Master] ERROR: unable to get
>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>     org.apache.thrift.transport.TTransportException:
>>>>     java.net.NoRouteToHostException: No route to host
>>>>     2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO : Loaded
>>>>     class
>>>>     org.apache.accumulo.server.master.balancer.DefaultLoadBalancer for
>>>>     table !0
>>>>     2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1 tablets
>>>>     2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
>>>>     table state for store Root Tablet
>>>>     org.apache.thrift.transport.TTransportException:
>>>>     java.net.NoRouteToHostException: No route to host
>>>>             at
>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>> createNewTransport(ThriftTransportPool.java:475)
>>>>             at
>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>> getTransport(ThriftTransportPool.java:464)
>>>>             at
>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>> getTransport(ThriftTransportPool.java:441)
>>>>             at
>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>> getTransportWithDefaultTimeout(ThriftTransportPool.java:366)
>>>>
>>>>
>>>>
>>>> In the slave's tserver.log all I see is
>>>>
>>>>     2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
>>>>     tablet server lock (reason = LOCK_DELETED), exiting.
>>>>
>>>>
>>> --
>>>
>>> Kurt Christensen
>>> P.O. Box 811
>>> Westminster, MD 21158-0811
>>>
>>> ------------------------------------------------------------------------
>>> If you can't explain it simply, you don't understand it well enough. --
>>> Albert Einstein
>>>
>>
>>
>


-- 
Sean

Mime
View raw message