accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3880) Malformed Configuration Causes tservers To Shutdown
Date Tue, 02 Jun 2015 17:27:50 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569462#comment-14569462
] 

Josh Elser commented on ACCUMULO-3880:
--------------------------------------

bq. There should be a better way to figure out "I'm not in the right cluster". Some basic
check of the cluster id, for example.

This is the sort of thing I meant. If we can find deficiencies in disallowing unwanted servers
fast, let's improve that as we _should_ have the means to identify the majority of cases.

bq. But tablet server locks are held under the instance id, so we automatically have some
guarantee of "in the right instance."

A tabletserver can't obtain it's lock w/o the correct instance.secret for the instance id,
right? That was something I thought about as another layer we have in place now.

> Malformed Configuration Causes tservers To Shutdown
> ---------------------------------------------------
>
>                 Key: ACCUMULO-3880
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3880
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0
>         Environment: HDP 2.2.7.0 to HDP 2.3.0.0 Upgrade
>            Reporter: Jonathan Hurley
>            Assignee: Josh Elser
>            Priority: Critical
>             Fix For: 1.6.3, 1.7.0, 1.8.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During a rolling upgrade from HDP 2.2 to HDP 2.3, Accumulo tracer fails to start because
it is unable to find any tabletservers. The tabletserver were updated to HDP 2.3 earlier in
the upgrade process and did come online briefly. 
> The PID file still exist, but the tservers are definitely down:
> {noformat}
> [root@c6401 accumulo]# cat accumulo-accumulo-tserver.pid
> 6075
> [root@c6401 accumulo]# ps -a | grep 6075
> {noformat}
> It seems like the problem might be located in the following piece of code:
> {code}
>     private void checkPermission(TCredentials credentials, String lock, final String
request) throws ThriftSecurityException {
>       boolean fatal = false;
>       try {
>         log.trace("Got " + request + " message from user: " + credentials.getPrincipal());
>         if (!security.canPerformSystemActions(credentials)) {
>           log.warn("Got " + request + " message from user: " + credentials.getPrincipal());
>           throw new ThriftSecurityException(credentials.getPrincipal(), SecurityErrorCode.PERMISSION_DENIED);
>         }
>       } catch (ThriftSecurityException e) {
>         log.warn("Got " + request + " message from unauthenticatable user: " + e.getUser());
>         if (getCredentials().getToken().getClass().getName().equals(credentials.getTokenClassName()))
{
>           log.error("Got message from a service with a mismatched configuration. Please
ensure a compatible configuration.", e);
>           fatal = true;
>         }
>         throw e;
>       } finally {
>         if (fatal) {
>           Halt.halt(1, new Runnable() {
>             @Override
>             public void run() {
>               gcLogger.logGCInfo(TabletServer.this.getConfiguration());
>             }
>           });
>         }
>       }
> {code}
> Where a malformed principal causes a {{Halt}}.
> From the tserver logs:
> {noformat}
> 2015-06-01 19:25:30,462 [rpc.TServerUtils] DEBUG: Instantiating default, unsecure custom
half-async Thrift server
> 2015-06-01 19:25:30,468 [tserver.TabletServer] INFO : address = c6401.ambari.apache.org:9997
> 2015-06-01 19:25:30,510 [tserver.TabletServer] INFO : Waiting for tablet server lock
> {noformat}
> There is also no content in the *.out or *.err files for tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message