accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser" <josh.el...@gmail.com>
Subject Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
Date Wed, 07 Jan 2015 01:50:52 GMT


> On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
> > shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
> > <https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210>
> >
> >     Why is username the "short" user name? Is that unique in Kerberos? If not, the
long version should be used everywhere instead. Otherwise, one user can appear to be another
in logs, etc.
> >     
> >     If "getShortUserName" is not unique, it should avoided everywhere.
> 
> Josh Elser wrote:
>     Check out: http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
>     
>     Kerberos principals are of the form: primary/instance@realm. Kerberos principals
are typically categorized as users and services. A user is not qualified to a single instance
(a host) and represent authentication across the realm. For example, elserj@EXAMPLE.COM means
that I can "roam". Conversely, a service is typically "fixed" to a specific host. For example,
accumulo/node1.example.com@EXAMPLE.COM means that there is a process, logged in as 'accumulo'
on the host 'node1.example.com'. That service can't be run on any other host. Now, an important
note if someone actually creates a principal "accumulo@EXAMPLE.COM" this is unique with respect
to any other "accumulo/`host`@EXAMPLE.COM" principal. I'm not sure if we need to do anything
else other than convention of kerberos principals, or if we should be including the instance
in "our" username when present.
>     
>     This kind of ties back into the SystemCredentials discussion again.
> 
> Christopher Tubbs wrote:
>     Okay, so a smart configuration would make shortnames unique. However, UserGroupInformation
returns only the `primary` for the short name. This means that user names will have to be
unique across realms and instances. Right now, you are storing permissions using the short
name. So, any user with the same primary, will be able to masquerade as any other user with
the same primary from a different instance and/or realm, and be able to user their permissions
and authorizations. That's the problem with the shortname here. That's very unexpected.
> 
> Josh Elser wrote:
>     Bingo. If you look at how HDFS does their configuration, this is the same convention.
The lack of documentation from me leaves something to be desired here, and I apologize for
that.
>     
>     To save you looking at HDFS (if you care not to look), you'll see that an HDFS process
uses a given principal with a special replacement string `_HOST`. The common convention is
to use something like `dn/_HOST@EXAMPLE.COM` (the realm is unimportant for this example).
This ensures that the same configuration files can be used across all hosts in the HDFS instance,
and Hadoop dynamically replaces `_HOST` with the FQDN of the host. Thus, there's an implicit
link that all `dn/*@EXAMPLE.COM` can act as datanodes and this is protected by the fact that
access to the KDC is restricted (you can't make your own user). The circle of trust is two-fold:
having a keytab with the correct principal and that Hadoop is requires that specific configuration
(which restricts the principal).
> 
> Christopher Tubbs wrote:
>     My concerns here are more about the impact on users, than for the system credentials.
I don't know what HDFS is doing, but if they aren't (minimally) checking the realm when checking
permissions/access on an authenticated principal, then they are less secure than I think we
should be. Referencing HDFS also seems to imply that we're not so much doing Kerberos, as
we are implementing HDFS-specific Kerberos conventions (which are less secure, with respect
to data authorizations/permissions within Accumulo, than I'm comfortable with).
> 
> Josh Elser wrote:
>     bq. if they aren't (minimally) checking the realm when checking permissions/access
on an authenticated principal
>     
>     Do you mean the instance instead of the realm? In the case of a single realm, the
KDC is going to verify the correct realm. Assuming you meant the instance though (the optional
"/hostname"), it's typical that a user has the ability to use their credentials anywhere.
Thus, you typically see principals without instances for actual users. As far as I understand
it, that's what HDFS tends to follow and what I tried to as well. Accumulo doesn't care where
you come from, just what your name is and that you have valid credentials. I don't think we're
being substantially less secure by not including the instance in the Accumulo principal.
> 
> Christopher Tubbs wrote:
>     No, I mean the realm, to make it only necessary to guarantee uniqueness within a
realm, vs. across all known realms (more reasonable of a guarantee to make for a KDC user
admin). We could also include the instance (when specified), if we want to really be careful
that users aren't sharing permissions.
>     
>     In my concerns, I'm assuming we authenticate users in any realm. If we are somehow
restricted to a single realm (either by a "permittedRealm" configuration item or by the nature
of Kerberos itself), then realm isn't that important, but we should discuss more about the
instance. My understanding is that Kerberos authenticates the user by the fully qualified
Kerberos principal (`primary/instance@realm`) in whatever realm they are, but it doesn't have
to be a specific realm (like the same one as the server), and then we are truncating their
identity, essentially binning people from different realms into the same bucket. It's like
authenticating me as `Christopher Tubbs`, and then assigning me to a bucket called `Christopher`
where I share permissions/authorizations with all other `Christopher`s.
> 
> Josh Elser wrote:
>     Oh, I apologize, I follow you now. Your concern wasn't clicking for me.
>     
>     > My understanding is that Kerberos authenticates the user by the fully qualified
Kerberos principal (primary/instance@realm) in whatever realm they are, but it doesn't have
to be a specific realm (like the same one as the server), and then we are truncating their
identity, essentially binning people from different realms into the same bucket
>     
>     Well, the KDC you're communicating with has to be set up for the realm being requested
(and if one isn't provided, it will delegate to another KDC or drop you into a default realm,
depending on krb5.conf). As I understand it, if you haven't defined a `default_realm` in `libdefaults`
in krb5.conf, and a user comes in with an incorrect hostname (instance) or realm specification,
the KDC won't authenticate you which keeps them out of Accumulo completely. I use `default_realm`
locally, since I just use a dummy realm instead of actually matching my laptop.
>     
>     In all honesty thought, I haven't thought past single-realm KDC setups. Is enforcing
that clients are a member of the same realm the Accumulo server principals reside in sufficient?
I'm worried about scope-creep of trying to do multi-realm configuration correct before single
realm is adequately polished.
> 
> Christopher Tubbs wrote:
>     bq. Is enforcing that clients are a member of the same realm the Accumulo server
principals reside in sufficient?
>     
>     Perhaps. Where would we do this? In the site configuration?
>     
>     bq. I'm worried about scope-creep of trying to do multi-realm configuration correct
before single realm is adequately polished.
>     
>     Understood, but I'm thinking about it from the other side. I don't want to make assumptions
which are valid in a narrow case, but which leave security holes in a more general case. I'm
also coming at this from the perspective of dealing with X.509 certificates, and understanding
the differences between a CN and a DN.
>     
>     If we lock things down to a single realm (so we can safely omit it in our internal
structures), we'd still need to address the `instance` portion. For that, it sounded like
you were saying that `myPrimary/myInstance@myRealm` is distinct from `myPrimary@myRealm` and
could both be valid users according to the KDC. If that's the case, I think it makes sense
for the permissions handler/authorizer to use the `primary/instance` for the principal and
not just the `primary` (which is what shortname does), because it could have different permissions.
If the user administrator wishes to allow `myPrimary@myRealm`, then they should create such
a user in the KDC (I hope I'm understanding this correctly.), so we would just use `myPrimary`
as the user principal in Accumulo, but we shouldn't strip the instance off if it is present.
> 
> Josh Elser wrote:
>     > > Is enforcing that clients are a member of the same realm the Accumulo server
principals reside in sufficient?
>     > Perhaps. Where would we do this? In the site configuration?
>     
>     Yeah. My thought was to just piggy-back on top of the realm provided in the kerberos
principal. That keeps us from having to introduce a new property for something we know that
might not be entirely sufficient.
>     
>     > we'd still need to address the instance portion. For that, it sounded like you
were saying that myPrimary/myInstance@myRealm is distinct from myPrimary@myRealm and could
both be valid users according to the KDC
>     
>     Yes, principals are valid (and distinct!) both with and without an instance. In our
case, I believe the instance being distinct is undesirable (and where I was going with the
reference to how Hadoop does things). Any server with a given principal (or matching a certain
principal) is considered the Accumulo "system" user (along with the `instance.*` check we
mentioned earlier). A simple way to do this (without getting into complicated regex's defining
who is actually considered the system user) is to just treat any instance also as that user.
It brings a bit of coordination required in how KRB principals are created, but it's the "common"
configuration/deployment at the cost of flexibility. I would envision leveraging something
similar to the `auth_to_local` RULEs (http://web.mit.edu/kerberos/krb5-devel/doc/admin/conf_files/krb5_conf.html)
like Hadoop does, but I don't *really* want to do that right now (mapping some set of principal
regexs to a "user"). This would let us say th
 ings like "accumulo/node1.example.com" is "accumulo" as is "old_server/node2.example.com".
>     
>     For normal users, convention is that they aren't attached to an instance (and are
valid within the realm), and this implementation would be a limitation on us for edge cases
in KDC configurations.
>     
>     > If the user administrator wishes to allow myPrimary@myRealm, then they should
create such a user in the KDC (I hope I'm understanding this correctly.), so we would just
use myPrimary as the user principal in Accumulo, but we shouldn't strip the instance off if
it is present.
>     
>     Yes, you are correct. One thing I'm confused about is if there is ever a case that
a user would have an instance in their principal. Not understanding why this might actually
happen pushes me in the direction that truncating things is ok. That covers "human" users,
but "application" users would still be likely tied to a specific hostname, in which case perhaps
I can't punt on this for now. I really just want to avoid having N `accumulo/hostname` users
in our "database" which would the sum of all Accumulo server processes. The regex matching
would be needed to avoid that.
>     
>     Maybe this is experimental until I do that as well? Maybe I shouldn't commit any
of this without that? I'm not completely decided yet, but I'm erring on the former presently.
> 
> Christopher Tubbs wrote:
>     bq. My thought was to just piggy-back on top of the realm provided in the kerberos
principal.
>     
>     You mean the server's own realm? That makes sense to me. We can document that they
should match, but we'd need to make sure we explicitly check that.
>     
>     
>     bq. For normal users, convention is that they aren't attached to an instance (and
are valid within the realm), and this implementation would be a limitation on us for edge
cases in KDC configurations.
>     
>     My concerns here are for normal users. The !SYSTEM user doesn't even have permissions
or authorizations stored in ZK (it shouldn't anyway). I had assumed the !SYSTEM user would
be treated specially after authentication at the transport layer. I don't think it should
rely on the Kerberos principal. This relates to our other discussion about the SystemToken.
>     
>     bq. One thing I'm confused about is if there is ever a case that a user would have
an instance in their principal.
>     
>     I can imagine use cases where a user has permission to access a table, but only from
a specific, vetted system. This is analogous to OpenStack and EC2 security group / firewall
rules which allow access only from specific sources. MySQL also has this concept in its permissions
model.
>     
>     bq. That covers "human" users, but "application" users would still be likely tied
to a specific hostname, in which case perhaps I can't punt on this for now.
>     
>     Agreed.
>     
>     bq. I really just want to avoid having N accumulo/hostname users in our "database"
which would the sum of all Accumulo server processes. The regex matching would be needed to
avoid that.
>     
>     I don't think that's the case. The system user doesn't (shouldn't) write to the ZK
user database. Its permissions are evaluated separately, and it should never have any authorizations.
Rather than regex matching, our discussion around the SystemToken might help resolve this.
If the system credentials (!SYSTEM, SystemToken) are left as-is, then you can keep using those
internally after the transport layer is finished. I wouldn't use the server's Kerberos principal
for the server components. I'd keep using the existing !SYSTEM principal, but only after the
server component is verified at the transport layer to actually reflect a server component.
> 
> Josh Elser wrote:
>     > You mean the server's own realm? That makes sense to me. We can document that
they should match, but we'd need to make sure we explicitly check that.
>     
>     Precisely. I plan to add that check.
>     
>     > If the system credentials (!SYSTEM, SystemToken) are left as-is, then you can
keep using those internally after the transport layer is finished.
>     
>     That's a good point. I was thinking around this, instead of tackling it directly.
If we address the SYSTEM case specifically, is there anything else we have to do other than
switch the shortUserName to the full name (primary+instance)? I think that would address it.
> 
> Christopher Tubbs wrote:
>     If we're okay with the restriction that it must be in the same realm, then that seems
all. Just to be clear, are we really sure we want to have that restriction? It seems like
the only reason to restrict it is to avoid including the realm internally (like in the ZK
storage). And, it'll be problematic if we decide to permit multi-realm authentication in the
future.
>     
>     We could serialize the whole thing, to future proof, but keep the restriction to
one realm (for now, until we think through the implications of multi-realm), and conveniently
only display without the realm (for example, in the shell, in whoami, etc.). That way, if
we do add multi-realm support later (by releasing the restriction), we can keep the shorter
names for those in the same realm, and only include the realm when it is different than the
server.

Hiding the realm is a possibility, I believe I need to think on this and/or look at what other
projects have done in this regard. Perhaps I'm just being obstinant in not wanting to include
the realm at all, and we should always have it. I'm not sure.

I believe that the above decision is also going to impact what we do about multiple realms
as well. If we store the full principal, the multi-realm problem goes away. Perhaps that's
my sign?


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/#review66382
-----------------------------------------------------------


On Jan. 6, 2015, 11:14 p.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29386/
> -----------------------------------------------------------
> 
> (Updated Jan. 6, 2015, 11:14 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2815
>     https://issues.apache.org/jira/browse/ACCUMULO-2815
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-2815 Initial support for Kerberos client authentication.
> 
> Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements.
Introduced...
> 
> * An Accumulo KerberosToken which is an AuthenticationToken to validate users.
> * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB
identity and Accumulo authentication.
> * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities
seamlessly.
> * New ClientConf variables to use SASL transport and pass Kerberos server principal
> * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is
enabled (no extra client work).
> 
> I believe this is the "bare minimum" for Kerberos support. They are also grossly lacking
in unit and integration tests. I believe that I might have somehow broken the client address
string in the server (I saw log messages with client: null, but I'm not sure if it's due to
these changes or not). A necessary limitation in the Thrift server used is that, like the
SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which
means none of the [half]async thrift servers will function with this -- we're stuck with the
TThreadPoolServer.
> 
> Performed some contrived benchmarks on my laptop (while still using it myself) to get
at big-picture view of the performance impact against "normal" operation and Kerberos alone.
Each "run" was the duration to ingest 100M records using continuous-ingest, timed with `time`,
using 'real'.
> 
> THsHaServer (our default), 6 runs:
> 
> Avg: 10m7.273s (607.273s)
> Min: 9m43.395s
> Max: 10m52.715s
> 
> TThreadPoolServer (no SASL), 5 runs:
> 
> Avg: 11m16.254s (676.254s)
> Min: 10m30.987s
> Max: 12m24.192s
> 
> TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:
> 
> Avg: 13m17.187s (797.187s)
> Min: 10m52.997s
> Max: 16m0.975s
> 
> The general takeway is that there's about 15% performance degredation in its initial
state which is in the realm of what I expected (~10%).
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 
>   core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 6fe61a5

>   core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java e75bec6

>   core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java f481cc3

>   core/src/main/java/org/apache/accumulo/core/client/impl/MasterClient.java a9ad8a1 
>   core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 6dc846f

>   core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java 5da803b

>   core/src/main/java/org/apache/accumulo/core/client/security/tokens/AbstractKerberosToken.java
PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java
PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f 
>   core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 6eace77 
>   core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 
>   core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java PRE-CREATION

>   core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 
>   core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 
>   core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java PRE-CREATION

>   core/src/test/java/org/apache/accumulo/core/client/impl/ThriftTransportKeyTest.java
PRE-CREATION 
>   core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 40be70f

>   core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java PRE-CREATION

>   minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloClusterImpl.java
27d6b19 
>   minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloConfigImpl.java
26c23ed 
>   pom.xml ae188a0 
>   proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb 
>   server/base/src/main/java/org/apache/accumulo/server/AccumuloServerContext.java 09ae4f4

>   server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java 046cfb5 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandler.java
PRE-CREATION 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingWrapper.java
PRE-CREATION 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java 641c0bf

>   server/base/src/main/java/org/apache/accumulo/server/rpc/ThriftServerType.java PRE-CREATION

>   server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
5e81018 
>   server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java 29e4939

>   server/base/src/main/java/org/apache/accumulo/server/security/SystemCredentials.java
a59d57c 
>   server/base/src/main/java/org/apache/accumulo/server/security/handler/KerberosAuthenticator.java
PRE-CREATION 
>   server/base/src/main/java/org/apache/accumulo/server/thrift/UGIAssumingProcessor.java
PRE-CREATION 
>   server/base/src/test/java/org/apache/accumulo/server/AccumuloServerContextTest.java
PRE-CREATION 
>   server/base/src/test/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandlerTest.java
PRE-CREATION 
>   server/base/src/test/java/org/apache/accumulo/server/security/SystemCredentialsTest.java
4202a7e 
>   server/gc/src/main/java/org/apache/accumulo/gc/SimpleGarbageCollector.java 93a9a49

>   server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java
f98721f 
>   server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java 99558b8

>   server/gc/src/test/java/org/apache/accumulo/gc/replication/CloseWriteAheadLogReferencesTest.java
cad1e01 
>   server/master/src/main/java/org/apache/accumulo/master/Master.java 12195fa 
>   server/tracer/src/main/java/org/apache/accumulo/tracer/TraceServer.java 7e33300 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java d5c1d2f

>   shell/src/main/java/org/apache/accumulo/shell/Shell.java 58308ff 
>   shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java 8167ef8 
>   shell/src/test/java/org/apache/accumulo/shell/ShellConfigTest.java 0e72c8c 
>   shell/src/test/java/org/apache/accumulo/shell/ShellOptionsJCTest.java PRE-CREATION

>   test/pom.xml b0a926f 
>   test/src/main/java/org/apache/accumulo/test/functional/ZombieTServer.java eb84533 
>   test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java 2ebc2e3

>   test/src/test/java/org/apache/accumulo/harness/AccumuloClusterIT.java 8f7e1b7 
>   test/src/test/java/org/apache/accumulo/harness/MiniClusterHarness.java abdb627 
>   test/src/test/java/org/apache/accumulo/harness/MiniClusterKdc.java PRE-CREATION 
>   test/src/test/java/org/apache/accumulo/harness/SharedMiniClusterIT.java 2380f66 
>   test/src/test/java/org/apache/accumulo/harness/conf/AccumuloMiniClusterConfiguration.java
11b7530 
>   test/src/test/java/org/apache/accumulo/server/security/SystemCredentialsIT.java fb71f5f

>   test/src/test/java/org/apache/accumulo/test/ArbitraryTablePropertiesIT.java aa5c164

>   test/src/test/java/org/apache/accumulo/test/CleanWalIT.java 1fcd5a4 
>   test/src/test/java/org/apache/accumulo/test/functional/BatchScanSplitIT.java 221889b

>   test/src/test/java/org/apache/accumulo/test/functional/KerberosIT.java PRE-CREATION

>   test/src/test/resources/log4j.properties cb35840 
> 
> Diff: https://reviews.apache.org/r/29386/diff/
> 
> 
> Testing
> -------
> 
> Ensure existing unit tests still function. Accumulo is functional and ran continuous
ingest multiple times using a client with only a Kerberos identity (no user/password provided).
Used MIT Kerberos with Apache Hadoop 2.6.0 and Apache ZooKeeper 3.4.5.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message