hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: Hadoop Security - TaskTracker and Active Directory
Date Mon, 03 Oct 2011 20:44:44 GMT
Doing everything in the Active Directory should work as well.. What I said earlier was more
from the Yahoo deployment of security. Let us know how it goes.

On Oct 1, 2011, at 9:36 AM, bigbibguy father wrote:

> Thanks Devaraj for responding.
> In our case , the LDAP server is the corporate active directory server, which has the
user id and the attributes.
> Cluster nodes contact KDC for getting TGT and service tickets for NN and JT and keep
them until the expiry time (7 days). Cluster nodes contact LDAP Server for each task. So if
I understand correctly, the LDAP traffic from the cluster nodes (around 1000)  will be much
more than the Authentication traffic from cluster nodes.  
> Why not use the Active Directory as the KDC for authenticating the service principals
(cluster nodes)  also?
> In this way , we do not have to manage a separate KDC and worry about it's availability
and health.
> We also plan to have one Active Directory server at the same datacenter as the cluster
, but outside the cluster firewall so that LDAP queries have a higher SLA.
> The benefits associated with the local KDC option are below  and my analysis is added
for each of the benefit.
> It requires less configuration with Active Directory.  - But cluster nodes need to talk
to Active Directory for the user details. So it anyway needs the configuration with Active
> It is comparatively easy to script the creation of many principals and keytabs. A principal
and keytab must be created for every daemon in the cluster, and in a large cluster this can
be extremely onerous to do directly in Active Directory.  - This is a one time job and we
may be able to script this with AD also.
> There is no need to involve central Active Directory administrators in order to get service
principals created. - We get to manage the OU containing the service principals.
> It allows for incremental configuration. The Hadoop administrator can completely configure
and verify the functionality the cluster independently of integrating with Active Directory
- Good to have this benefit and this is not available in the Active Directory only option
> It can serve to shield the corporate Active Directory server(s) from the many machines
in a Hadoop cluster all requesting Kerberos tickets simultaneously. During cluster start-up,
Hadoop will effectively be acting as a distributed denial of service attack on the central
Active Directory server, which could adversely affect the performance of the Active Directory
server. - The service principal authentication traffic is not that frequent and hence these
spikes should not be much of a problem for our highly available Active Directory. 
>       But the drawback for local KDC option is that we need to maintain that KDC server
and make sure its highly available with backup server. 
> Thanks and Regards,
> On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <ddas@hortonworks.com> wrote:
> The Cluster KDC should be set up to trust the Active Directory KDC (cross-realm trust
in the kerberos lingo). This handles the cases of user authentication when a user talks to
a server in the cluster directly (e.g., user->namenode). 
> The GID and other user attributes are usually stored in ldap. The cluster nodes are set
up to talk to the cluster specific ldap server. 
> On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:
>> We are planning to enable secure Hadoop using Kerberos. 
>> Our users reside in the active directory. We read that there are two options  to
use Kerberos for securing Hadoop.
>> 1) You run Kerberos on machine local to the cluster and create service principals
>> 2) Use Active Directory itself as the kerberos KDC and create service principals
also in Active Directory.
>> It seems cloudera and industry in general recommends option1 of running a local KDC
for authernticating service principals.
>> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
>>  I read that the tasktrackers run tasks as the user who submitted the user. In that
case , doesn't the TaskTracker nodes need to talk to the Active Directory to get the user
details like gid etc ?
>> So does this mean that every node (tasktrackers, job tracker and namenode)  will
be interacting with the Active Directory anyway ?
>> If so, option 1 doesn't seem to be superior since each node has to talk to two kdc's
- local kerberos for authenticating service principals, Active Directory to get the user details
and group information . 
>> Please correct me if I am wrong in my assumptions.
>> Thanks and Regards,
>> BBG

View raw message