accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 15:58:15 GMT
Sean

Thanks for looking into the log files.

These are two Google compute engine instance under the same project so
there shouldn't be any firewall between them.

For the brief moment that the slave runs during startup, I can nc into port
9997 from the master to the slave.  But after it crashes, I can't.  Seems
like somehow the problem is on the slave.

Arshak
On Dec 31, 2013 11:58 PM, "Sean Busbey" <busbey+ml@clouderagovt.com> wrote:

> Well, I can tell you the proximal cause.  the tserver log shows that it
> starts normally, then exits because it's told to (via the zookeeper lock
> being removed).
>
> If you look at the master debug logs, this happens because the master
> fails in three attempts to talk to the tserver, all with the same error:
>
> 2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get tablet server
> status 10.240.203.36:9997[1434c70ed30001b]
> org.apache.thrift.transport.TTransportException:
> java.net.NoRouteToHostException: No route to host
>
> Unfortunately, this is the same error you noticed in your first email.
> After 3 of those, the master deletes the zk lock so that the tserver will
> shutdown.
>
> Could there be another firewall blocking access to port 9997 on the worker
> machine from the master machine?
>
> Check from the master (you'll need netcat):
>
> $ nc -z 10.240.203.36 9997
> $ echo $?
>
>
>
>
>
> On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan <arshakn@gmail.com>wrote:
>
>> I am probably missing something really basic so I posted both the master
>> and the slave log files:
>>
>> https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>>
>> Thanks again to everyone for the help!
>>
>>
>> On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan <arshakn@gmail.com>wrote:
>>
>>> disabled selinux (iptables already off) on both master and slave but
>>> didn't make a difference unfortunately.
>>>
>>>
>>>
>>> On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen <hoodel@hoodel.com>wrote:
>>>
>>>>
>>>> SELINUX disabled? IPTABLES configured? I have nothing else.
>>>>
>>>> Kurt
>>>>
>>>> ------
>>>>
>>>>
>>>> On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
>>>>
>>>>> I configured a new instance with a master and a slave tserver.  When
I
>>>>> do start-all on the master, the slave doesn't come up.  I am wondering
if
>>>>> it's because I left the instance secret as the default. (I get an exception
>>>>> when I try to change that).
>>>>>
>>>>> This is what I see in the master's monitor regarding the slave
>>>>>
>>>>>     Non-Functioning Tablet Servers
>>>>>     The following tablet servers reported a status other than Online
>>>>>
>>>>> 10.240.203.36:9997 <http://10.240.203.36:9997>  UNRESPONSIVE
>>>>>
>>>>>
>>>>>
>>>>> In the master log I see the following
>>>>>
>>>>>     2013-12-31 22:56:13,665 [master.Master] ERROR: unable to get
>>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>     2013-12-31 22:56:13,712 [master.Master] ERROR: unable to get
>>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>     2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO : Loaded
>>>>>     class
>>>>>     org.apache.accumulo.server.master.balancer.DefaultLoadBalancer for
>>>>>     table !0
>>>>>     2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1 tablets
>>>>>     2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
>>>>>     table state for store Root Tablet
>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>             at
>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>> createNewTransport(ThriftTransportPool.java:475)
>>>>>             at
>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>> getTransport(ThriftTransportPool.java:464)
>>>>>             at
>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>> getTransport(ThriftTransportPool.java:441)
>>>>>             at
>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>> getTransportWithDefaultTimeout(ThriftTransportPool.java:366)
>>>>>
>>>>>
>>>>>
>>>>> In the slave's tserver.log all I see is
>>>>>
>>>>>     2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
>>>>>     tablet server lock (reason = LOCK_DELETED), exiting.
>>>>>
>>>>>
>>>> --
>>>>
>>>> Kurt Christensen
>>>> P.O. Box 811
>>>> Westminster, MD 21158-0811
>>>>
>>>> ------------------------------------------------------------
>>>> ------------
>>>> If you can't explain it simply, you don't understand it well enough. --
>>>> Albert Einstein
>>>>
>>>
>>>
>>
>
>
> --
> Sean
>

Mime
View raw message