hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9421) Convert SASL to use ProtoBuf and add lengths for non-blocking processing
Date Fri, 21 Jun 2013 04:55:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690034#comment-13690034

Daryn Sharp commented on HADOOP-9421:

You seem to be trying to tailor a design that only considers today's implementation of tokens
and kerberos.  It seems "easy" when you assume there are only two choices.  The optimization
becomes more and more complicated, and in many cases impossible, instead of simply doing something
the server tells you to do.

When pluggable auth support allows a world of heterogenous security, such as Knox or Rhino,
requiring REINITIATE penalties becomes very expensive.

Sorry for the very long read, but these are topics I intended to address on the call that
unfortunately didn't happen today.

+IP failover+
Distinct service principals with IP failover isn't "insane".  With a shared principal services
can't be accessed directly because the host doesn't match the shared principal.  So a different
config with a hardcoded shared principal is needed.  Similarly, DNs won't be able to heartbeat
directly into HA NNs.  I'm sure there are more problems than we've already discovered investigating
that route.

The root issue is the client must only use the hostname that appears in the kerberos service
principal.  Which means you can't access the service via all its interface, hostnames, or
even pretty CNAMEs.

If server advertises "this is who I am" via the NEGOTIATE, then the problem is solved.

+Token selection issues+
Selecting tokens pre-connection based on the service as a host or ip port tuple is a problem.
 Let's take a few examples:

Using the IP precludes multi-interface host support, for instance if you want to have a fast/private
intra-cluster network and a separate public network.  Tokens will contain the public IP, but
clients using the private interface (different IP) can't find them.  This isn't contrived,
it's something Cloudera has wanted to do.

You also can't use the IP because changing a service's IP will break clients using tokens
with the old IP.  In comes the bane of my creation, use_ip=false, to use the given hostname.
 But you can't allow non-fully qualified names because it will resolve differently on depending
on the dns search path.  There's a raft of reasons why the canonicalization isn't as straightforward
as you'd think, which led to a custom NetUtils resolver and complicated path normalization.

Likewise, any sort of public proxy or NAT-ing between an external client and a cluster service
creates an unusable token service within the grid.

HA token logic is unnecessarily convoluted to clone tokens from a logical uri into multiple
tokens with each failover's service.

A clean solution to all these problems is tokens contain a server generated opaque id.  The
server NEGOTIATE reports this id.  The client looks for a token with that id.  Now no matter
what interface/IP/hostname/proxy/NAT is used, the client will always find the token.

If you cut out the use of the NEGOTIATE, this ability is gone.

+Supporting new auth methods+
Other new auths in the future may need the protocol/serverId hints from the NEGOTIATE to locate
the required credentials.  Guessing may not be an option.

The RPC client shouldn't have to be modified to make a pre-connection guess for all the auth
methods it supports.  Because...

Why should the client attempt an auth method before it _even knows if the server can do it_?
 Let's look at some hairy examples:

The client tries to do kerberos, so it needs to generate the initial response to take advantage
of your "optimization".  But the server isn't kerberized.  So either the client fails because
it has no TGT, which it doesn't even need!  Or fails to get a non-existent service principal.

What if the client decides to use an SSO service, but the server doesn't do SSO?  Take a REINITIATE
penalty every time?

+Supporting new mechanisms+
Let's say we add support for a new mechanism like SCRAM.  Just because the client can do it
doesn't mean all services across all clusters can do it.  The server's NEGOTIATE will tell
the client if it can do DIGEST-MD5, SCRAM, etc.

Inter-cluster compatibility and rolling upgrades will introduce scenarios where the required
mechanism differs, and penalizing the client to REINITIATE is not a valid option.


In all of these scenarios, there aren't complex issues if the NEGOTIATE is used to chose an
appropriate auth type.  In a world of multiple auths and multiple mechanisms for an auth,
requiring REINITIATE penalties is too expensive.

Ignoring all the issues I've cited, your optimization doesn't appear to have a positive impact
on performance.  Even if it did shave a few milliseconds or even 100ms, will it have a measurable
real-world impact?  Considering how many RPC requests are performed over a single connection,
will the negligible penalty from one extra packet make any difference?

I feel like we've spent weeks haggling over an ill-suited pre-mature optimization that could
been spent building upon this implementation. :(
> Convert SASL to use ProtoBuf and add lengths for non-blocking processing
> ------------------------------------------------------------------------
>                 Key: HADOOP-9421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sanjay Radia
>            Assignee: Daryn Sharp
>            Priority: Blocker
>         Attachments: HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch,
HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421-v2-demo.patch

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message