accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs" <>
Subject Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
Date Tue, 06 Jan 2015 20:08:48 GMT

> On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
> > server/base/src/main/java/org/apache/accumulo/server/security/handler/,
line 41
> > <>
> >
> >     I don't think this should be so tightly coupled with the ZKAuthenticator. It
doesn't seem that this coupling is necessary to provide authentication functionality.
> >     
> >     Have a different implementation of an Authenticator that hijacks the data storage
of another implementation, could be quite confusing to administrators, especially if they
need to swap out the implementation.
> >     
> >     See ACCUMULO-1300 and related issues for other options.
> Josh Elser wrote:
>     I understand the intentions for ACCUMULO-1300; however, that's not implemented. Being
able to leverage the existing ZKAuthorizor was wonderful so I really don't want to move away
from how this fundamentally works.
>     Shouldn't the Authenticator and Authorizor be fixed per instance (allowing reset
via `accumulo init --reset-security` or w/e)? What do you actually want changed aside from
documentation to make it clear to administrators (I haven't written any documentation yet).
> Christopher Tubbs wrote:
>     So, I pointed out in the conversation around ACCUMULO-259, that I would prefer a
pluggable security module that integrated all the security-related functions, because they
are so dependent on each other. However, they were implemented as 3 separately configurable
pluggable components. I would still like to see them merged (which doesn't preclude making
one from modular components themselves).
>     For now, I'm not really sure what it means for somebody to "create a user" when the
authenticator already validates them. That's why it's confusing. Are we essentially making
ZKAuthenticator Kerberos-aware, or are we providing a separate Kerberos Authenticator? That's
really what it boils down to for me. It seems like we're doing a hybrid... but I think we
should solidify on one or the other.
> Josh Elser wrote:
>     I view it as a Kerberos Authenticator. The implementation details of that authenticator
is that is happens to persist information into ZK to handle things like permissions and authorizations,
but a user doesn't have to know that to use it. We could, hypothetically, swap out ZK for
any other persistent store for that information and it would continue to work.
>     What would help alleviate your confusion? More javadocs? A different class name?
> Christopher Tubbs wrote:
>     Well, part is documentation. Part is behavior.
>     If it's a separate authenticator, we should make sure that it does not use the same
storage area in such a tightly coupled way. It should not, for instance, be setting things
that look like passwords, and "create local user", "list local users", and "delete local users"
should be unsupported client side operations, just as you've done with "change password" (since
users are managed within Kerberos). We already share storage between the ZKAuthenticator and
the corresponding permissions handler and authorizer, so it's probably okay to share the directory
identified with the principal. However, it can seamlessly create this directory as needed.
>     Mainly, I'm thinking the more we can separate the "local user" management functions
from the KerberosAuthenticator, the better, since users are managed in Kerberos. Some of my
concerns are with the current state of things, but I'm hoping not to exacerbate the problems
I see with the current implementations. I don't want to make it harder to make improvements
later. I'd rather move towards a better design, then simply add features to the existing ad-hoc
> Josh Elser wrote:
>     bq. It should not, for instance, be setting things that look like passwords, and
"create local user", "list local users", and "delete local users" should be unsupported client
side operations, just as you've done with "change password" (since users are managed within
>     Thanks for pointing that out. That helps me understand where your concerns are coming
from. Perhaps it would be cleaner to just keep a reference to the ZkAuthenticator as a delegate
(instead of extending it). This would give us a bit more obvious control over what's happening.
>     bq. I'm thinking the more we can separate the "local user" management functions from
the KerberosAuthenticator, the better, since users are managed in Kerberos. Some of my concerns
are with the current state of things, but I'm hoping not to exacerbate the problems I see
with the current implementations
>     Agreed, those are very.. awkward, presently. Is making sure that we throw exceptions
from KerberosAuthenticator when we call these non-sensical methods a sufficient change? Then,
we can move to a better design later on like you said?

What do you actually need the ZKAuthenticator for? Just to create the directory, so the ZKPermHandler
and ZKAuthorizor have a place to put stuff? It might be better to sever completely (vs. delegate
or subclass), and just use a common constant for the ZK Path to create the user directories
to make those other two components happy (assuming that's all that ZKAuthenticator was needed

Throwing an exception from KerberosAuthenticator might work... I'm more concerned about what
happens on the client side, and these may not map one-to-one. So, a closer inspection/testing
will be need to ensure that's sufficient.

The important bit about implementing additional Authenticators, is that when a user is deleted
in the external system, somebody needs to call the cleanup/dropuser methods on the PermHandler/Authorizor.
That will need to be documented. And, it's a bit weird, because the actual interfaces aren't
public API (it's one of the reasons why I prefer a single, comprehensive pluggable security
module, because it helps consolidate this kind of integration between components).

> On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
> > shell/src/main/java/org/apache/accumulo/shell/, line 210
> > <>
> >
> >     Why is username the "short" user name? Is that unique in Kerberos? If not, the
long version should be used everywhere instead. Otherwise, one user can appear to be another
in logs, etc.
> >     
> >     If "getShortUserName" is not unique, it should avoided everywhere.
> Josh Elser wrote:
>     Check out:
>     Kerberos principals are of the form: primary/instance@realm. Kerberos principals
are typically categorized as users and services. A user is not qualified to a single instance
(a host) and represent authentication across the realm. For example, elserj@EXAMPLE.COM means
that I can "roam". Conversely, a service is typically "fixed" to a specific host. For example,
accumulo/ means that there is a process, logged in as 'accumulo'
on the host ''. That service can't be run on any other host. Now, an important
note if someone actually creates a principal "accumulo@EXAMPLE.COM" this is unique with respect
to any other "accumulo/`host`@EXAMPLE.COM" principal. I'm not sure if we need to do anything
else other than convention of kerberos principals, or if we should be including the instance
in "our" username when present.
>     This kind of ties back into the SystemCredentials discussion again.
> Christopher Tubbs wrote:
>     Okay, so a smart configuration would make shortnames unique. However, UserGroupInformation
returns only the `primary` for the short name. This means that user names will have to be
unique across realms and instances. Right now, you are storing permissions using the short
name. So, any user with the same primary, will be able to masquerade as any other user with
the same primary from a different instance and/or realm, and be able to user their permissions
and authorizations. That's the problem with the shortname here. That's very unexpected.
> Josh Elser wrote:
>     Bingo. If you look at how HDFS does their configuration, this is the same convention.
The lack of documentation from me leaves something to be desired here, and I apologize for
>     To save you looking at HDFS (if you care not to look), you'll see that an HDFS process
uses a given principal with a special replacement string `_HOST`. The common convention is
to use something like `dn/_HOST@EXAMPLE.COM` (the realm is unimportant for this example).
This ensures that the same configuration files can be used across all hosts in the HDFS instance,
and Hadoop dynamically replaces `_HOST` with the FQDN of the host. Thus, there's an implicit
link that all `dn/*@EXAMPLE.COM` can act as datanodes and this is protected by the fact that
access to the KDC is restricted (you can't make your own user). The circle of trust is two-fold:
having a keytab with the correct principal and that Hadoop is requires that specific configuration
(which restricts the principal).
> Christopher Tubbs wrote:
>     My concerns here are more about the impact on users, than for the system credentials.
I don't know what HDFS is doing, but if they aren't (minimally) checking the realm when checking
permissions/access on an authenticated principal, then they are less secure than I think we
should be. Referencing HDFS also seems to imply that we're not so much doing Kerberos, as
we are implementing HDFS-specific Kerberos conventions (which are less secure, with respect
to data authorizations/permissions within Accumulo, than I'm comfortable with).
> Josh Elser wrote:
>     bq. if they aren't (minimally) checking the realm when checking permissions/access
on an authenticated principal
>     Do you mean the instance instead of the realm? In the case of a single realm, the
KDC is going to verify the correct realm. Assuming you meant the instance though (the optional
"/hostname"), it's typical that a user has the ability to use their credentials anywhere.
Thus, you typically see principals without instances for actual users. As far as I understand
it, that's what HDFS tends to follow and what I tried to as well. Accumulo doesn't care where
you come from, just what your name is and that you have valid credentials. I don't think we're
being substantially less secure by not including the instance in the Accumulo principal.

No, I mean the realm, to make it only necessary to guarantee uniqueness within a realm, vs.
across all known realms (more reasonable of a guarantee to make for a KDC user admin). We
could also include the instance (when specified), if we want to really be careful that users
aren't sharing permissions.

In my concerns, I'm assuming we authenticate users in any realm. If we are somehow restricted
to a single realm (either by a "permittedRealm" configuration item or by the nature of Kerberos
itself), then realm isn't that important, but we should discuss more about the instance. My
understanding is that Kerberos authenticates the user by the fully qualified Kerberos principal
(`primary/instance@realm`) in whatever realm they are, but it doesn't have to be a specific
realm (like the same one as the server), and then we are truncating their identity, essentially
binning people from different realms into the same bucket. It's like authenticating me as
`Christopher Tubbs`, and then assigning me to a bucket called `Christopher` where I share
permissions/authorizations with all other `Christopher`s.

- Christopher

This is an automatically generated e-mail. To reply, visit:

On Dec. 31, 2014, 4:24 p.m., Josh Elser wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated Dec. 31, 2014, 4:24 p.m.)
> Review request for accumulo.
> Bugs: ACCUMULO-2815
> Repository: accumulo
> Description
> -------
> ACCUMULO-2815 Initial support for Kerberos client authentication.
> Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements.
> * An Accumulo KerberosToken which is an AuthenticationToken to validate users.
> * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB
identity and Accumulo authentication.
> * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities
> * New ClientConf variables to use SASL transport and pass Kerberos server principal
> * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is
enabled (no extra client work).
> I believe this is the "bare minimum" for Kerberos support. They are also grossly lacking
in unit and integration tests. I believe that I might have somehow broken the client address
string in the server (I saw log messages with client: null, but I'm not sure if it's due to
these changes or not). A necessary limitation in the Thrift server used is that, like the
SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which
means none of the [half]async thrift servers will function with this -- we're stuck with the
> Performed some contrived benchmarks on my laptop (while still using it myself) to get
at big-picture view of the performance impact against "normal" operation and Kerberos alone.
Each "run" was the duration to ingest 100M records using continuous-ingest, timed with `time`,
using 'real'.
> THsHaServer (our default), 6 runs:
> Avg: 10m7.273s (607.273s)
> Min: 9m43.395s
> Max: 10m52.715s
> TThreadPoolServer (no SASL), 5 runs:
> Avg: 11m16.254s (676.254s)
> Min: 10m30.987s
> Max: 12m24.192s
> TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:
> Avg: 13m17.187s (797.187s)
> Min: 10m52.997s
> Max: 16m0.975s
> The general takeway is that there's about 15% performance degredation in its initial
state which is in the realm of what I expected (~10%).
> Diffs
> -----
>   core/src/main/java/org/apache/accumulo/core/cli/ f6ea934 
>   core/src/main/java/org/apache/accumulo/core/client/ 6fe61a5

>   core/src/main/java/org/apache/accumulo/core/client/impl/ e75bec6

>   core/src/main/java/org/apache/accumulo/core/client/impl/ f481cc3

>   core/src/main/java/org/apache/accumulo/core/client/impl/ 6dc846f

>   core/src/main/java/org/apache/accumulo/core/client/impl/ 5da803b

>   core/src/main/java/org/apache/accumulo/core/client/security/tokens/
>   core/src/main/java/org/apache/accumulo/core/conf/ e054a5f 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/ 6eace77 
>   core/src/main/java/org/apache/accumulo/core/rpc/ 09bd6c4 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/security/ 525a958 
>   core/src/test/java/org/apache/accumulo/core/cli/ ff49bc0 
>   core/src/test/java/org/apache/accumulo/core/client/ PRE-CREATION

>   core/src/test/java/org/apache/accumulo/core/conf/ 40be70f

>   core/src/test/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   proxy/src/main/java/org/apache/accumulo/proxy/ 4b048eb 
>   server/base/src/main/java/org/apache/accumulo/server/ 09ae4f4

>   server/base/src/main/java/org/apache/accumulo/server/init/ 046cfb5 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/
>   server/base/src/main/java/org/apache/accumulo/server/rpc/
>   server/base/src/main/java/org/apache/accumulo/server/rpc/ 641c0bf

>   server/base/src/main/java/org/apache/accumulo/server/rpc/ PRE-CREATION

>   server/base/src/main/java/org/apache/accumulo/server/security/
>   server/base/src/main/java/org/apache/accumulo/server/security/ 29e4939

>   server/base/src/main/java/org/apache/accumulo/server/security/
>   server/base/src/main/java/org/apache/accumulo/server/security/handler/
>   server/base/src/main/java/org/apache/accumulo/server/thrift/
>   server/base/src/test/java/org/apache/accumulo/server/
>   server/base/src/test/java/org/apache/accumulo/server/rpc/
>   server/base/src/test/java/org/apache/accumulo/server/security/
>   server/gc/src/main/java/org/apache/accumulo/gc/ 93a9a49

>   server/gc/src/test/java/org/apache/accumulo/gc/
>   server/gc/src/test/java/org/apache/accumulo/gc/ 99558b8

>   server/gc/src/test/java/org/apache/accumulo/gc/replication/
>   server/master/src/main/java/org/apache/accumulo/master/ 12195fa 
>   server/tracer/src/main/java/org/apache/accumulo/tracer/ 7e33300 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/ d5c1d2f

>   shell/src/main/java/org/apache/accumulo/shell/ 58308ff 
>   shell/src/main/java/org/apache/accumulo/shell/ 8167ef8 
>   shell/src/test/java/org/apache/accumulo/shell/ 0e72c8c 
>   shell/src/test/java/org/apache/accumulo/shell/ PRE-CREATION

>   test/src/main/java/org/apache/accumulo/test/functional/ eb84533 
>   test/src/main/java/org/apache/accumulo/test/performance/thrift/ 2ebc2e3

>   test/src/test/java/org/apache/accumulo/server/security/ fb71f5f

> Diff:
> Testing
> -------
> Ensure existing unit tests still function. Accumulo is functional and ran continuous
ingest multiple times using a client with only a Kerberos identity (no user/password provided).
Used MIT Kerberos with Apache Hadoop 2.6.0 and Apache ZooKeeper 3.4.5.
> Thanks,
> Josh Elser

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message