hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Tan <w...@us.ibm.com>
Subject Re: Occasional GSSException that brings down region server
Date Tue, 11 Mar 2014 13:05:16 GMT
Thanks Ted. Yes our team looked at the doc you pointed out and:

The key here is "every several hours" - so we can rule out 1) valid 
kerberos ticket ~ klist shows a valid ticket
, 2) [0] does not have our error message ~ link password / keytab / clocks 
/ realm is not incorrect ~ all these errors on this page seem to be for 
"does not work at all" conditions... not a "fails every randomly long 
amount of time"
3) we don't have this "problematic combination of components" listed... 
but again - this is a work / no work dichotomy...


Thanks,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   Ted Yu <yuzhihong@gmail.com>
To:     "user@hbase.apache.org" <user@hbase.apache.org>, 
Date:   03/10/2014 05:31 PM
Subject:        Re: Occasional GSSException that brings down region server



Have you looked at
http://hbase.apache.org/book.html#trouble.client.security.rpc ?


On Mon, Mar 10, 2014 at 2:26 PM, Wei Tan <wtan@us.ibm.com> wrote:

> Hi,
>
>   We are running a HBase cluster in these settings and with kerberos
> enabled.
> HBase: 0.96.1.1
> Zookeeper: 3.4.5
> Hadoop: 1.1.1
>
>
> We constantly put data into HBase and every several hours we get the 
error
> below on a random region server; this error arises and the region server
> kills itself.
>
> ERROR:
> 2014-02-28 09:32:39,755 ERROR 
[hconnection-0x116987ad-shared--pool1378-t9]
> security.UserGroupInformation: PriviledgedActionException
> as:XXXXXXXX@DOMAIN cause:javax.security.sasl.SaslException: GSS initiate
> failed [Caused by GSSException: No valid credentials provided (Mechanism
> level: The ticket isn't for us (35) - BAD TGS SERVER NAME)]
>
>
>
> We also tried with multiple version of kdc - all the way up to latest
> 1.12.1 - still see this error. What is weird is that most put gets
> processed successfully until this error occurs and kills the RS.
>
> Thanks,
> Wei
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message