hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bigbibguy father <bigbib...@gmail.com>
Subject Re: Hadoop Security - TaskTracker and Active Directory
Date Sat, 01 Oct 2011 16:36:13 GMT
Thanks Devaraj for responding.

In our case , the LDAP server is the corporate active directory server,
which has the user id and the attributes.

Cluster nodes contact KDC for getting TGT and service tickets for NN and JT
and keep them until the expiry time (7 days). Cluster nodes contact LDAP
Server for each task. So if I understand correctly, the LDAP traffic from
the cluster nodes (around 1000)  will be much more than the Authentication
traffic from cluster nodes.

Why not use the Active Directory as the KDC for authenticating the service
principals (cluster nodes)  also?

In this way , we do not have to manage a separate KDC and worry about it's
availability and health.

We also plan to have one Active Directory server at the same datacenter as
the cluster , but outside the cluster firewall so that LDAP queries have a
higher SLA.

The benefits associated with the local KDC option are below  and my analysis
is added for each of the benefit.

   - It requires less configuration with Active Directory.  - *But cluster
   nodes need to talk to Active Directory for the user details. So it anyway
   needs the configuration with Active Directory *
   - It is comparatively easy to script the creation of many principals and
   keytabs. A principal and keytab must be created for every daemon in the
   cluster, and in a large cluster this can be extremely onerous to do directly
   in Active Directory.  - *This is a one time job and we may be able to
   script this with AD also.*
   - There is no need to involve central Active Directory administrators in
   order to get service principals created. - *We get to manage the OU
   containing the service principals.*
   - It allows for incremental configuration. The Hadoop administrator can
   completely configure and verify the functionality the cluster independently
   of integrating with Active Directory - *Good to have this benefit and
   this is not available in the Active Directory only option*
   - It can serve to shield the corporate Active Directory server(s) from
   the many machines in a Hadoop cluster all requesting Kerberos tickets
   simultaneously. During cluster start-up, Hadoop will effectively be acting
   as a distributed denial of service attack on the central Active Directory
   server, which could adversely affect the performance of the Active Directory
   server. - *The service principal authentication traffic is not that
   frequent and hence these spikes should not be much of a problem for our
   highly available Active Directory. *

      But the drawback for local KDC option is that we need to maintain that
KDC server and make sure its highly available with backup server.

Thanks and Regards,

On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <ddas@hortonworks.com> wrote:

> The Cluster KDC should be set up to trust the Active Directory KDC
> (cross-realm trust in the kerberos lingo). This handles the cases of user
> authentication when a user talks to a server in the cluster directly (e.g.,
> user->namenode).
> The GID and other user attributes are usually stored in ldap. The cluster
> nodes are set up to talk to the cluster specific ldap server.
> On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:
> We are planning to enable secure Hadoop using Kerberos.
> Our users reside in the active directory. We read that there are two
> options  to use Kerberos for securing Hadoop.
> 1) You run Kerberos on machine local to the cluster and create service
> principals here
> 2) Use Active Directory itself as the kerberos KDC and create service
> principals also in Active Directory.
> It seems cloudera and industry in general recommends option1 of running a
> local KDC for authernticating service principals.
> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
>  I read that the tasktrackers run tasks as the user who submitted the user.
> In that case , doesn't the TaskTracker nodes need to talk to the Active
> Directory to get the user details like gid etc ?
> So does this mean that every node (tasktrackers, job tracker and namenode)
>  will be interacting with the Active Directory anyway ?
> If so, option 1 doesn't seem to be superior since each node has to talk to
> two kdc's - local kerberos for authenticating service principals, Active
> Directory to get the user details and group information .
> Please correct me if I am wrong in my assumptions.
> Thanks and Regards,

View raw message