accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: slave tserver not responding
Date Wed, 01 Jan 2014 22:41:21 GMT
The BAD_CREDENTIALS error is just the root password not matching the 
trace.token.property.password. By default, the configurations set the 
password for Accumulo's distributed trace mechanism to be "secret".

It's best to make a special user and password for tracing and configure 
it in accumulo-site.xml. An easy way to get rid of that error is to just 
set the aforementioned property equal to the root password (and chmod 
600 accumulo-site.xml) ;)

On 1/1/14, 3:46 PM, Michael Wall wrote:
> I don't know if it helps debugging, but I am seeing the following in
> tserver_shrine.log
>
> 2014-01-01 06:15:37,852 [hdfs.DFSClient] INFO : Exception in
> createBlockOutputStream 10.240.165.43:50010 <http://10.240.165.43:50010>
> java.io.IOException: Bad connect ack with firstBadLink as
> 10.240.203.36:50010 <http://10.240.203.36:50010>
> 2014-01-01 06:15:37,852 [hdfs.DFSClient] INFO : Abandoning block
> blk_-2756969025267118869_1348
> 2014-01-01 06:15:37,855 [hdfs.DFSClient] INFO : Excluding datanode
> 10.240.203.36:50010 <http://10.240.203.36:50010>
> 2014-01-01 06:15:38,147 [hdfs.DFSClient] INFO : Exception in
> createBlockOutputStream 10.240.165.43:50010 <http://10.240.165.43:50010>
> java.io.IOException: Bad connect ack with firstBadLink as
> 10.240.203.36:50010 <http://10.240.203.36:50010>
> 2014-01-01 06:15:38,148 [hdfs.DFSClient] INFO : Abandoning block
> blk_2883724569463729419_1349
> 2014-01-01 06:15:38,149 [hdfs.DFSClient] INFO : Excluding datanode
> 10.240.203.36:50010 <http://10.240.203.36:50010>
> 2014-01-01 06:15:38,554 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:39,559 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:40,565 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:41,571 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:42,578 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:43,586 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
> 2014-01-01 06:15:44,594 [client.ClientServiceHandler] ERROR:
> ThriftSecurityException(user:root, code:BAD_CREDENTIALS)
>
>
>
> On Wed, Jan 1, 2014 at 2:28 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Sure -- you have my address already.
>
>     Also, nc not working while the tabletserver is dead makes sense
>     (that process is what's listening on that port). Once the process
>     dies, there's nothing else listening.
>
>
>     On 1/1/2014 1:31 PM, Arshak Navruzyan wrote:
>
>         If anyone wants to look at my live environment please let me
>         know (your
>         gmail) and I will add you to the Google Compute Engine.  Thanks!
>
>
>         On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan
>         <arshakn@gmail.com <mailto:arshakn@gmail.com>
>         <mailto:arshakn@gmail.com <mailto:arshakn@gmail.com>>> wrote:
>
>              Sean
>
>              Thanks for looking into the log files.
>
>              These are two Google compute engine instance under the same
>         project
>              so there shouldn't be any firewall between them.
>
>              For the brief moment that the slave runs during startup, I
>         can nc
>              into port 9997 from the master to the slave.  But after it
>         crashes,
>              I can't.  Seems like somehow the problem is on the slave.
>
>              Arshak
>
>              On Dec 31, 2013 11:58 PM, "Sean Busbey"
>         <busbey+ml@clouderagovt.com <mailto:busbey%2Bml@clouderagovt.com>
>              <mailto:busbey%2Bml@__clouderagovt.com
>         <mailto:busbey%252Bml@clouderagovt.com>>> wrote:
>
>                  Well, I can tell you the proximal cause.  the tserver
>         log shows
>                  that it starts normally, then exits because it's told
>         to (via
>                  the zookeeper lock being removed).
>
>                  If you look at the master debug logs, this happens
>         because the
>                  master fails in three attempts to talk to the tserver,
>         all with
>                  the same error:
>
>                  2014-01-01 06:17:20,231 [master.Master] ERROR: unable
>         to get
>                  tablet server status 10.240.203.36:9997[__1434c70ed30001b]
>                  org.apache.thrift.transport.__TTransportException:
>         java.net <http://java.net>.__NoRouteToHostException: No route to
>         host
>
>                  Unfortunately, this is the same error you noticed in
>         your first
>                  email. After 3 of those, the master deletes the zk lock
>         so that
>                  the tserver will shutdown.
>
>                  Could there be another firewall blocking access to port
>         9997 on
>                  the worker machine from the master machine?
>
>                  Check from the master (you'll need netcat):
>
>                  $ nc -z 10.240.203.36 9997
>                  $ echo $?
>
>
>
>
>
>                  On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan
>                  <arshakn@gmail.com <mailto:arshakn@gmail.com>
>         <mailto:arshakn@gmail.com <mailto:arshakn@gmail.com>>> wrote:
>
>                      I am probably missing something really basic so I
>         posted
>                      both the master and the slave log files:
>
>         https://www.dropbox.com/sh/__liv1mzuohyiv6uu/X5kx7AZJ6i
>         <https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i>
>
>                      Thanks again to everyone for the help!
>
>
>                      On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan
>                      <arshakn@gmail.com <mailto:arshakn@gmail.com>
>         <mailto:arshakn@gmail.com <mailto:arshakn@gmail.com>>> wrote:
>
>                          disabled selinux (iptables already off) on both
>         master
>                          and slave but didn't make a difference
>         unfortunately.
>
>
>
>                          On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen
>                          <hoodel@hoodel.com <mailto:hoodel@hoodel.com>
>         <mailto:hoodel@hoodel.com <mailto:hoodel@hoodel.com>>> wrote:
>
>
>                              SELINUX disabled? IPTABLES configured? I have
>                              nothing else.
>
>                              Kurt
>
>                              ------
>
>
>                              On 12/31/13 6:02 PM, Arshak Navruzyan wrote:
>
>                                  I configured a new instance with a
>         master and a
>                                  slave tserver.  When I do start-all on the
>                                  master, the slave doesn't come up.  I am
>                                  wondering if it's because I left the
>         instance
>                                  secret as the default. (I get an
>         exception when
>                                  I try to change that).
>
>                                  This is what I see in the master's monitor
>                                  regarding the slave
>
>                                       Non-Functioning Tablet Servers
>                                       The following tablet servers
>         reported a
>                                  status other than Online
>
>         10.240.203.36:9997 <http://10.240.203.36:9997>
>         <http://10.240.203.36:9997>
>                                  <http://10.240.203.36:9997>  UNRESPONSIVE
>
>
>
>                                  In the master log I see the following
>
>                                       2013-12-31 22:56:13,665
>         [master.Master]
>                                  ERROR: unable to get
>                                       tablet server status
>                                  10.240.203.36:9997[____1434a79d34404a2]
>
>
>         org.apache.thrift.transport.____TTransportException:
>         java.net <http://java.net>
>
>         <http://java.net>.____NoRouteToHostException: No
>
>                                  route to host
>                                       2013-12-31 22:56:13,712
>         [master.Master]
>                                  ERROR: unable to get
>                                       tablet server status
>                                  10.240.203.36:9997[____1434a79d34404a2]
>
>
>         org.apache.thrift.transport.____TTransportException:
>         java.net <http://java.net>
>
>         <http://java.net>.____NoRouteToHostException: No
>
>                                  route to host
>                                       2013-12-31 22:56:13,802
>                                  [balancer.TableLoadBalancer] INFO : Loaded
>                                       class
>
>
>         org.apache.accumulo.server.____master.balancer.____DefaultLoadBalancer
>
>                                  for
>                                       table !0
>                                       2013-12-31 22:56:13,803
>         [master.Master]
>                                  INFO : Assigning 1 tablets
>                                       2013-12-31 22:56:13,812
>         [master.Master]
>                                  ERROR: Error processing
>                                       table state for store Root Tablet
>
>
>         org.apache.thrift.transport.____TTransportException:
>         java.net <http://java.net>
>
>         <http://java.net>.____NoRouteToHostException: No
>                                  route to host
>                                               at
>
>
>         org.apache.accumulo.core.____client.impl.____ThriftTransportPool.____createNewTransport(____ThriftTransportPool.java:475)
>                                               at
>
>
>         org.apache.accumulo.core.____client.impl.____ThriftTransportPool.____getTransport(____ThriftTransportPool.java:464)
>                                               at
>
>
>         org.apache.accumulo.core.____client.impl.____ThriftTransportPool.____getTransport(____ThriftTransportPool.java:441)
>                                               at
>
>
>         org.apache.accumulo.core.____client.impl.____ThriftTransportPool.____getTransportWithDefaultTimeout____(ThriftTransportPool.java:__366)
>
>
>
>
>                                  In the slave's tserver.log all I see is
>
>                                       2013-12-31 22:56:34,731
>                                  [tabletserver.TabletServer] FATAL: Lost
>                                       tablet server lock (reason =
>         LOCK_DELETED),
>                                  exiting.
>
>
>                              --
>
>                              Kurt Christensen
>                              P.O. Box 811
>                              Westminster, MD 21158-0811
>
>
>         ------------------------------____----------------------------__--__------------
>
>                              If you can't explain it simply, you don't
>         understand
>                              it well enough. -- Albert Einstein
>
>
>
>
>
>
>                  --
>                  Sean
>
>
>

Mime
View raw message