accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs" <>
Subject Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
Date Tue, 06 Jan 2015 21:44:06 GMT

> On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
> > shell/src/main/java/org/apache/accumulo/shell/, line 210
> > <>
> >
> >     Why is username the "short" user name? Is that unique in Kerberos? If not, the
long version should be used everywhere instead. Otherwise, one user can appear to be another
in logs, etc.
> >     
> >     If "getShortUserName" is not unique, it should avoided everywhere.
> Josh Elser wrote:
>     Check out:
>     Kerberos principals are of the form: primary/instance@realm. Kerberos principals
are typically categorized as users and services. A user is not qualified to a single instance
(a host) and represent authentication across the realm. For example, elserj@EXAMPLE.COM means
that I can "roam". Conversely, a service is typically "fixed" to a specific host. For example,
accumulo/ means that there is a process, logged in as 'accumulo'
on the host ''. That service can't be run on any other host. Now, an important
note if someone actually creates a principal "accumulo@EXAMPLE.COM" this is unique with respect
to any other "accumulo/`host`@EXAMPLE.COM" principal. I'm not sure if we need to do anything
else other than convention of kerberos principals, or if we should be including the instance
in "our" username when present.
>     This kind of ties back into the SystemCredentials discussion again.
> Christopher Tubbs wrote:
>     Okay, so a smart configuration would make shortnames unique. However, UserGroupInformation
returns only the `primary` for the short name. This means that user names will have to be
unique across realms and instances. Right now, you are storing permissions using the short
name. So, any user with the same primary, will be able to masquerade as any other user with
the same primary from a different instance and/or realm, and be able to user their permissions
and authorizations. That's the problem with the shortname here. That's very unexpected.
> Josh Elser wrote:
>     Bingo. If you look at how HDFS does their configuration, this is the same convention.
The lack of documentation from me leaves something to be desired here, and I apologize for
>     To save you looking at HDFS (if you care not to look), you'll see that an HDFS process
uses a given principal with a special replacement string `_HOST`. The common convention is
to use something like `dn/_HOST@EXAMPLE.COM` (the realm is unimportant for this example).
This ensures that the same configuration files can be used across all hosts in the HDFS instance,
and Hadoop dynamically replaces `_HOST` with the FQDN of the host. Thus, there's an implicit
link that all `dn/*@EXAMPLE.COM` can act as datanodes and this is protected by the fact that
access to the KDC is restricted (you can't make your own user). The circle of trust is two-fold:
having a keytab with the correct principal and that Hadoop is requires that specific configuration
(which restricts the principal).
> Christopher Tubbs wrote:
>     My concerns here are more about the impact on users, than for the system credentials.
I don't know what HDFS is doing, but if they aren't (minimally) checking the realm when checking
permissions/access on an authenticated principal, then they are less secure than I think we
should be. Referencing HDFS also seems to imply that we're not so much doing Kerberos, as
we are implementing HDFS-specific Kerberos conventions (which are less secure, with respect
to data authorizations/permissions within Accumulo, than I'm comfortable with).
> Josh Elser wrote:
>     bq. if they aren't (minimally) checking the realm when checking permissions/access
on an authenticated principal
>     Do you mean the instance instead of the realm? In the case of a single realm, the
KDC is going to verify the correct realm. Assuming you meant the instance though (the optional
"/hostname"), it's typical that a user has the ability to use their credentials anywhere.
Thus, you typically see principals without instances for actual users. As far as I understand
it, that's what HDFS tends to follow and what I tried to as well. Accumulo doesn't care where
you come from, just what your name is and that you have valid credentials. I don't think we're
being substantially less secure by not including the instance in the Accumulo principal.
> Christopher Tubbs wrote:
>     No, I mean the realm, to make it only necessary to guarantee uniqueness within a
realm, vs. across all known realms (more reasonable of a guarantee to make for a KDC user
admin). We could also include the instance (when specified), if we want to really be careful
that users aren't sharing permissions.
>     In my concerns, I'm assuming we authenticate users in any realm. If we are somehow
restricted to a single realm (either by a "permittedRealm" configuration item or by the nature
of Kerberos itself), then realm isn't that important, but we should discuss more about the
instance. My understanding is that Kerberos authenticates the user by the fully qualified
Kerberos principal (`primary/instance@realm`) in whatever realm they are, but it doesn't have
to be a specific realm (like the same one as the server), and then we are truncating their
identity, essentially binning people from different realms into the same bucket. It's like
authenticating me as `Christopher Tubbs`, and then assigning me to a bucket called `Christopher`
where I share permissions/authorizations with all other `Christopher`s.
> Josh Elser wrote:
>     Oh, I apologize, I follow you now. Your concern wasn't clicking for me.
>     > My understanding is that Kerberos authenticates the user by the fully qualified
Kerberos principal (primary/instance@realm) in whatever realm they are, but it doesn't have
to be a specific realm (like the same one as the server), and then we are truncating their
identity, essentially binning people from different realms into the same bucket
>     Well, the KDC you're communicating with has to be set up for the realm being requested
(and if one isn't provided, it will delegate to another KDC or drop you into a default realm,
depending on krb5.conf). As I understand it, if you haven't defined a `default_realm` in `libdefaults`
in krb5.conf, and a user comes in with an incorrect hostname (instance) or realm specification,
the KDC won't authenticate you which keeps them out of Accumulo completely. I use `default_realm`
locally, since I just use a dummy realm instead of actually matching my laptop.
>     In all honesty thought, I haven't thought past single-realm KDC setups. Is enforcing
that clients are a member of the same realm the Accumulo server principals reside in sufficient?
I'm worried about scope-creep of trying to do multi-realm configuration correct before single
realm is adequately polished.

bq. Is enforcing that clients are a member of the same realm the Accumulo server principals
reside in sufficient?

Perhaps. Where would we do this? In the site configuration?

bq. I'm worried about scope-creep of trying to do multi-realm configuration correct before
single realm is adequately polished.

Understood, but I'm thinking about it from the other side. I don't want to make assumptions
which are valid in a narrow case, but which leave security holes in a more general case. I'm
also coming at this from the perspective of dealing with X.509 certificates, and understanding
the differences between a CN and a DN.

If we lock things down to a single realm (so we can safely omit it in our internal structures),
we'd still need to address the `instance` portion. For that, it sounded like you were saying
that `myPrimary/myInstance@myRealm` is distinct from `myPrimary@myRealm` and could both be
valid users according to the KDC. If that's the case, I think it makes sense for the permissions
handler/authorizer to use the `primary/instance` for the principal and not just the `primary`
(which is what shortname does), because it could have different permissions. If the user administrator
wishes to allow `myPrimary@myRealm`, then they should create such a user in the KDC (I hope
I'm understanding this correctly.), so we would just use `myPrimary` as the user principal
in Accumulo, but we shouldn't strip the instance off if it is present.

- Christopher

This is an automatically generated e-mail. To reply, visit:

On Dec. 31, 2014, 4:24 p.m., Josh Elser wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated Dec. 31, 2014, 4:24 p.m.)
> Review request for accumulo.
> Bugs: ACCUMULO-2815
> Repository: accumulo
> Description
> -------
> ACCUMULO-2815 Initial support for Kerberos client authentication.
> Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements.
> * An Accumulo KerberosToken which is an AuthenticationToken to validate users.
> * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB
identity and Accumulo authentication.
> * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities
> * New ClientConf variables to use SASL transport and pass Kerberos server principal
> * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is
enabled (no extra client work).
> I believe this is the "bare minimum" for Kerberos support. They are also grossly lacking
in unit and integration tests. I believe that I might have somehow broken the client address
string in the server (I saw log messages with client: null, but I'm not sure if it's due to
these changes or not). A necessary limitation in the Thrift server used is that, like the
SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which
means none of the [half]async thrift servers will function with this -- we're stuck with the
> Performed some contrived benchmarks on my laptop (while still using it myself) to get
at big-picture view of the performance impact against "normal" operation and Kerberos alone.
Each "run" was the duration to ingest 100M records using continuous-ingest, timed with `time`,
using 'real'.
> THsHaServer (our default), 6 runs:
> Avg: 10m7.273s (607.273s)
> Min: 9m43.395s
> Max: 10m52.715s
> TThreadPoolServer (no SASL), 5 runs:
> Avg: 11m16.254s (676.254s)
> Min: 10m30.987s
> Max: 12m24.192s
> TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:
> Avg: 13m17.187s (797.187s)
> Min: 10m52.997s
> Max: 16m0.975s
> The general takeway is that there's about 15% performance degredation in its initial
state which is in the realm of what I expected (~10%).
> Diffs
> -----
>   core/src/main/java/org/apache/accumulo/core/cli/ f6ea934 
>   core/src/main/java/org/apache/accumulo/core/client/ 6fe61a5

>   core/src/main/java/org/apache/accumulo/core/client/impl/ e75bec6

>   core/src/main/java/org/apache/accumulo/core/client/impl/ f481cc3

>   core/src/main/java/org/apache/accumulo/core/client/impl/ 6dc846f

>   core/src/main/java/org/apache/accumulo/core/client/impl/ 5da803b

>   core/src/main/java/org/apache/accumulo/core/client/security/tokens/
>   core/src/main/java/org/apache/accumulo/core/conf/ e054a5f 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/ 6eace77 
>   core/src/main/java/org/apache/accumulo/core/rpc/ 09bd6c4 
>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/security/ 525a958 
>   core/src/test/java/org/apache/accumulo/core/cli/ ff49bc0 
>   core/src/test/java/org/apache/accumulo/core/client/ PRE-CREATION

>   core/src/test/java/org/apache/accumulo/core/conf/ 40be70f

>   core/src/test/java/org/apache/accumulo/core/rpc/ PRE-CREATION

>   proxy/src/main/java/org/apache/accumulo/proxy/ 4b048eb 
>   server/base/src/main/java/org/apache/accumulo/server/ 09ae4f4

>   server/base/src/main/java/org/apache/accumulo/server/init/ 046cfb5 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/
>   server/base/src/main/java/org/apache/accumulo/server/rpc/
>   server/base/src/main/java/org/apache/accumulo/server/rpc/ 641c0bf

>   server/base/src/main/java/org/apache/accumulo/server/rpc/ PRE-CREATION

>   server/base/src/main/java/org/apache/accumulo/server/security/
>   server/base/src/main/java/org/apache/accumulo/server/security/ 29e4939

>   server/base/src/main/java/org/apache/accumulo/server/security/
>   server/base/src/main/java/org/apache/accumulo/server/security/handler/
>   server/base/src/main/java/org/apache/accumulo/server/thrift/
>   server/base/src/test/java/org/apache/accumulo/server/
>   server/base/src/test/java/org/apache/accumulo/server/rpc/
>   server/base/src/test/java/org/apache/accumulo/server/security/
>   server/gc/src/main/java/org/apache/accumulo/gc/ 93a9a49

>   server/gc/src/test/java/org/apache/accumulo/gc/
>   server/gc/src/test/java/org/apache/accumulo/gc/ 99558b8

>   server/gc/src/test/java/org/apache/accumulo/gc/replication/
>   server/master/src/main/java/org/apache/accumulo/master/ 12195fa 
>   server/tracer/src/main/java/org/apache/accumulo/tracer/ 7e33300 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/ d5c1d2f

>   shell/src/main/java/org/apache/accumulo/shell/ 58308ff 
>   shell/src/main/java/org/apache/accumulo/shell/ 8167ef8 
>   shell/src/test/java/org/apache/accumulo/shell/ 0e72c8c 
>   shell/src/test/java/org/apache/accumulo/shell/ PRE-CREATION

>   test/src/main/java/org/apache/accumulo/test/functional/ eb84533 
>   test/src/main/java/org/apache/accumulo/test/performance/thrift/ 2ebc2e3

>   test/src/test/java/org/apache/accumulo/server/security/ fb71f5f

> Diff:
> Testing
> -------
> Ensure existing unit tests still function. Accumulo is functional and ran continuous
ingest multiple times using a client with only a Kerberos identity (no user/password provided).
Used MIT Kerberos with Apache Hadoop 2.6.0 and Apache ZooKeeper 3.4.5.
> Thanks,
> Josh Elser

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message