accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 19:28:15 GMT
Sure -- you have my address already.

Also, nc not working while the tabletserver is dead makes sense (that 
process is what's listening on that port). Once the process dies, 
there's nothing else listening.

On 1/1/2014 1:31 PM, Arshak Navruzyan wrote:
> If anyone wants to look at my live environment please let me know (your
> gmail) and I will add you to the Google Compute Engine.  Thanks!
>
>
> On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <arshakn@gmail.com
> <mailto:arshakn@gmail.com>> wrote:
>
>     Sean
>
>     Thanks for looking into the log files.
>
>     These are two Google compute engine instance under the same project
>     so there shouldn't be any firewall between them.
>
>     For the brief moment that the slave runs during startup, I can nc
>     into port 9997 from the master to the slave.  But after it crashes,
>     I can't.  Seems like somehow the problem is on the slave.
>
>     Arshak
>
>     On Dec 31, 2013 11:58 PM, "Sean Busbey" <busbey+ml@clouderagovt.com
>     <mailto:busbey%2Bml@clouderagovt.com>> wrote:
>
>         Well, I can tell you the proximal cause.  the tserver log shows
>         that it starts normally, then exits because it's told to (via
>         the zookeeper lock being removed).
>
>         If you look at the master debug logs, this happens because the
>         master fails in three attempts to talk to the tserver, all with
>         the same error:
>
>         2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get
>         tablet server status 10.240.203.36:9997[1434c70ed30001b]
>         org.apache.thrift.transport.TTransportException:
>         java.net.NoRouteToHostException: No route to host
>
>         Unfortunately, this is the same error you noticed in your first
>         email. After 3 of those, the master deletes the zk lock so that
>         the tserver will shutdown.
>
>         Could there be another firewall blocking access to port 9997 on
>         the worker machine from the master machine?
>
>         Check from the master (you'll need netcat):
>
>         $ nc -z 10.240.203.36 9997
>         $ echo $?
>
>
>
>
>
>         On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan
>         <arshakn@gmail.com <mailto:arshakn@gmail.com>> wrote:
>
>             I am probably missing something really basic so I posted
>             both the master and the slave log files:
>
>             https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>
>             Thanks again to everyone for the help!
>
>
>             On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan
>             <arshakn@gmail.com <mailto:arshakn@gmail.com>> wrote:
>
>                 disabled selinux (iptables already off) on both master
>                 and slave but didn't make a difference unfortunately.
>
>
>
>                 On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen
>                 <hoodel@hoodel.com <mailto:hoodel@hoodel.com>> wrote:
>
>
>                     SELINUX disabled? IPTABLES configured? I have
>                     nothing else.
>
>                     Kurt
>
>                     ------
>
>
>                     On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
>
>                         I configured a new instance with a master and a
>                         slave tserver.  When I do start-all on the
>                         master, the slave doesn't come up.  I am
>                         wondering if it's because I left the instance
>                         secret as the default. (I get an exception when
>                         I try to change that).
>
>                         This is what I see in the master's monitor
>                         regarding the slave
>
>                              Non-Functioning Tablet Servers
>                              The following tablet servers reported a
>                         status other than Online
>
>                         10.240.203.36:9997 <http://10.240.203.36:9997>
>                         <http://10.240.203.36:9997>  UNRESPONSIVE
>
>
>
>                         In the master log I see the following
>
>                              2013-12-31 22:56:13,665 [master.Master]
>                         ERROR: unable to get
>                              tablet server status
>                         10.240.203.36:9997[__1434a79d34404a2]
>
>                         org.apache.thrift.transport.__TTransportException:
>                         java.net
>                         <http://java.net>.__NoRouteToHostException: No
>                         route to host
>                              2013-12-31 22:56:13,712 [master.Master]
>                         ERROR: unable to get
>                              tablet server status
>                         10.240.203.36:9997[__1434a79d34404a2]
>
>                         org.apache.thrift.transport.__TTransportException:
>                         java.net
>                         <http://java.net>.__NoRouteToHostException: No
>                         route to host
>                              2013-12-31 22:56:13,802
>                         [balancer.TableLoadBalancer] INFO : Loaded
>                              class
>
>                         org.apache.accumulo.server.__master.balancer.__DefaultLoadBalancer
>                         for
>                              table !0
>                              2013-12-31 22:56:13,803 [master.Master]
>                         INFO : Assigning 1 tablets
>                              2013-12-31 22:56:13,812 [master.Master]
>                         ERROR: Error processing
>                              table state for store Root Tablet
>
>                         org.apache.thrift.transport.__TTransportException:
>                         java.net
>                         <http://java.net>.__NoRouteToHostException: No
>                         route to host
>                                      at
>
>                         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__createNewTransport(__ThriftTransportPool.java:475)
>                                      at
>
>                         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:464)
>                                      at
>
>                         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransport(__ThriftTransportPool.java:441)
>                                      at
>
>                         org.apache.accumulo.core.__client.impl.__ThriftTransportPool.__getTransportWithDefaultTimeout__(ThriftTransportPool.java:366)
>
>
>
>                         In the slave's tserver.log all I see is
>
>                              2013-12-31 22:56:34,731
>                         [tabletserver.TabletServer] FATAL: Lost
>                              tablet server lock (reason = LOCK_DELETED),
>                         exiting.
>
>
>                     --
>
>                     Kurt Christensen
>                     P.O. Box 811
>                     Westminster, MD 21158-0811
>
>                     ------------------------------__------------------------------__------------
>                     If you can't explain it simply, you don't understand
>                     it well enough. -- Albert Einstein
>
>
>
>
>
>
>         --
>         Sean
>
>

Mime
View raw message