accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 18:31:19 GMT
If anyone wants to look at my live environment please let me know (your
gmail) and I will add you to the Google Compute Engine.  Thanks!


On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <arshakn@gmail.com> wrote:

> Sean
>
> Thanks for looking into the log files.
>
> These are two Google compute engine instance under the same project so
> there shouldn't be any firewall between them.
>
> For the brief moment that the slave runs during startup, I can nc into
> port 9997 from the master to the slave.  But after it crashes, I can't.
> Seems like somehow the problem is on the slave.
>
> Arshak
> On Dec 31, 2013 11:58 PM, "Sean Busbey" <busbey+ml@clouderagovt.com>
> wrote:
>
>> Well, I can tell you the proximal cause.  the tserver log shows that it
>> starts normally, then exits because it's told to (via the zookeeper lock
>> being removed).
>>
>> If you look at the master debug logs, this happens because the master
>> fails in three attempts to talk to the tserver, all with the same error:
>>
>> 2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get tablet
>> server status 10.240.203.36:9997[1434c70ed30001b]
>> org.apache.thrift.transport.TTransportException:
>> java.net.NoRouteToHostException: No route to host
>>
>> Unfortunately, this is the same error you noticed in your first email.
>> After 3 of those, the master deletes the zk lock so that the tserver will
>> shutdown.
>>
>> Could there be another firewall blocking access to port 9997 on the
>> worker machine from the master machine?
>>
>> Check from the master (you'll need netcat):
>>
>> $ nc -z 10.240.203.36 9997
>> $ echo $?
>>
>>
>>
>>
>>
>> On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan <arshakn@gmail.com>wrote:
>>
>>> I am probably missing something really basic so I posted both the master
>>> and the slave log files:
>>>
>>> https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>>>
>>> Thanks again to everyone for the help!
>>>
>>>
>>> On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan <arshakn@gmail.com>wrote:
>>>
>>>> disabled selinux (iptables already off) on both master and slave but
>>>> didn't make a difference unfortunately.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen <hoodel@hoodel.com>wrote:
>>>>
>>>>>
>>>>> SELINUX disabled? IPTABLES configured? I have nothing else.
>>>>>
>>>>> Kurt
>>>>>
>>>>> ------
>>>>>
>>>>>
>>>>> On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
>>>>>
>>>>>> I configured a new instance with a master and a slave tserver.  When
>>>>>> I do start-all on the master, the slave doesn't come up.  I am wondering
if
>>>>>> it's because I left the instance secret as the default. (I get an
exception
>>>>>> when I try to change that).
>>>>>>
>>>>>> This is what I see in the master's monitor regarding the slave
>>>>>>
>>>>>>     Non-Functioning Tablet Servers
>>>>>>     The following tablet servers reported a status other than Online
>>>>>>
>>>>>> 10.240.203.36:9997 <http://10.240.203.36:9997>  UNRESPONSIVE
>>>>>>
>>>>>>
>>>>>>
>>>>>> In the master log I see the following
>>>>>>
>>>>>>     2013-12-31 22:56:13,665 [master.Master] ERROR: unable to get
>>>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>>     2013-12-31 22:56:13,712 [master.Master] ERROR: unable to get
>>>>>>     tablet server status 10.240.203.36:9997[1434a79d34404a2]
>>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>>     2013-12-31 22:56:13,802 [balancer.TableLoadBalancer] INFO : Loaded
>>>>>>     class
>>>>>>     org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>>>>> for
>>>>>>     table !0
>>>>>>     2013-12-31 22:56:13,803 [master.Master] INFO : Assigning 1 tablets
>>>>>>     2013-12-31 22:56:13,812 [master.Master] ERROR: Error processing
>>>>>>     table state for store Root Tablet
>>>>>>     org.apache.thrift.transport.TTransportException:
>>>>>>     java.net.NoRouteToHostException: No route to host
>>>>>>             at
>>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>>> createNewTransport(ThriftTransportPool.java:475)
>>>>>>             at
>>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>>> getTransport(ThriftTransportPool.java:464)
>>>>>>             at
>>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>>> getTransport(ThriftTransportPool.java:441)
>>>>>>             at
>>>>>>     org.apache.accumulo.core.client.impl.ThriftTransportPool.
>>>>>> getTransportWithDefaultTimeout(ThriftTransportPool.java:366)
>>>>>>
>>>>>>
>>>>>>
>>>>>> In the slave's tserver.log all I see is
>>>>>>
>>>>>>     2013-12-31 22:56:34,731 [tabletserver.TabletServer] FATAL: Lost
>>>>>>     tablet server lock (reason = LOCK_DELETED), exiting.
>>>>>>
>>>>>>
>>>>> --
>>>>>
>>>>> Kurt Christensen
>>>>> P.O. Box 811
>>>>> Westminster, MD 21158-0811
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------
>>>>> If you can't explain it simply, you don't understand it well enough.
>>>>> -- Albert Einstein
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Sean
>>
>

Mime
View raw message