hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11908) libhdfs++: Authentication failure when first NN of kerberized HA cluster is standby
Date Mon, 10 Jul 2017 18:05:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

James Clampffer updated HDFS-11908:
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed this to HDFS-8707.  HDFS-12111 filed for CI testing.

The manual tests I did were pretty simple: set up an HA kerberized cluster and make sure dfs.namenodes
has the standby NN listed first.  When it tries to fail over and connect to the active you'll
get warnings about simple auth not being supported.  Apply this patch and those go away. 
Same thing with the first NN shut down.  Repeated the test with gdb attached to make sure
that AuthInfo was actually being default initialized to use simple auth in the failing case
and sasl auth with patch.

> libhdfs++: Authentication failure when first NN of kerberized HA cluster is standby
> -----------------------------------------------------------------------------------
>                 Key: HDFS-11908
>                 URL: https://issues.apache.org/jira/browse/HDFS-11908
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-11908.HDFS-8707.000.patch
> Library won't properly authenticate to kerberized HA cluster if the first namenode it
tries to connect to is the standby.  RpcConnection ends up attempting to use simple auth.
> Control flow to connect to NN for the first time:
> # RpcConnection constructed with a pointer to the RpcEngine as the only argument
> # RpcConnection::Connect(server endpoints, auth_info, callback called)
> ** auth_info contains the SASL mechanism to use + the delegation token if we already
have one
> Control flow to connect to NN after failover:
> # RpcEngine::NewConnection called, allocates an RpcConnection exactly how step 1 above
> # RpcEngine::InitializeConnection called, sets event hooks and a string for cluster name
> # Rpc calls sent using RpcConnection::PreEnqueueRequests called to add RPC message that
didn't make it on last call due to standby exception
> # RpcConnection::ConnectAndFlush called to send RPC packets. This only takes server endpoints,
no auth info
> To fix:
> RpcEngine::InitializeConnection just needs to set RpcConnection::auth_info_ from the
existing RpcEngine::auth_info_, even better would be setting this in the constructor so if
an RpcConnection exists it can be expected to be in a usable state.  I'll get a diff up once
I sort out CI build failures.
> Also really need to get CI test coverage for HA and kerberos because this issue should
not have been around for so long.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message