hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yongzhi Wang <wang.yongzhi2...@gmail.com>
Subject Re: Securing cluster from access
Date Fri, 28 Sep 2012 16:18:05 GMT
This document has clear description, although I don't know if it
applies to hadoop2.0.

http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html

I quote some text from this document. Hopefully this can help you.

Overview

The Hadoop Distributed File System (HDFS) implements a permissions
model for files and directories that shares much of the POSIX model.
Each file and directory is associated with an owner and a group. The
file or directory has separate permissions for the user that is the
owner, for other users that are members of the group, and for all
other users. For files, the r permission is required to read the file,
and the w permission is required to write or append to the file. For
directories, the r permission is required to list the contents of the
directory, the w permission is required to create or delete files or
directories, and the x permission is required to access a child of the
directory.

In contrast to the POSIX model, there are no sticky, setuid or setgid
bits for files as there is no notion of executable files. For
directories, there are no setuid or setgid bits directory as a
simplification. Collectively, the permissions of a file or directory
are its mode. In general, Unix customs for representing and displaying
modes will be used, including the use of octal numbers in this
description. When a file or directory is created, its owner is the
user identity of the client process, and its group is the group of the
parent directory (the BSD rule).

Each client process that accesses HDFS has a two-part identity
composed of the user name, and groups list. Whenever HDFS must do a
permissions check for a file or directory foo accessed by a client
process,

    If the user name matches the owner of foo, then the owner
permissions are tested;
    Else if the group of foo matches any of member of the groups list,
then the group permissions are tested;
    Otherwise the other permissions of foo are tested.

If a permissions check fails, the client operation fails.

Configuration Parameters

    dfs.permissions = true
    If yes use the permissions system as described here. If no,
permission checking is turned off, but all other behavior is
unchanged. Switching from one parameter value to the other does not
change the mode, owner or group of files or directories.
    Regardless of whether permissions are on or off, chmod, chgrp and
chown always check permissions. These functions are only useful in the
permissions context, and so there is no backwards compatibility issue.
Furthermore, this allows administrators to reliably set owners and
permissions in advance of turning on regular permissions checking.

Best regards,
Yongzhi

On Fri, Sep 28, 2012 at 6:24 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
> Harsh is right. It is important to know what is the difference between
> authorization and authentication.
> However if you do not want anybody to write to your cluster from outside
> then a firewall might be enough.
> You block everything but you allow access to the webinterfaces (without
> private actions enabled) from only a limited set of IPs.
>
> Regards
>
> Bertrand
>
>
> On Fri, Sep 28, 2012 at 12:00 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> ACLs are a good way to control roles of users, but in insecure mode
>> users can easily be impersonated, rendering ACLs useless as a 'secure'
>> measure.
>>
>> On Fri, Sep 28, 2012 at 3:15 PM, Shin Chan <hadoop@gmx.com> wrote:
>> > Hello Bertrand ,
>> >
>> > Thanks for your reply.
>> >
>> > Apology if this confused you. Yes IP Tables is one of the way to go but
>> > my
>> > question is more if there is configuration within hadoop xml files to
>> > say if
>> > this user is there then only allow to see HDFS.
>> >
>> > I can see that we can do something for Map reduce jobs using acl
>> > properties
>> > ( old link for 1.x version)
>> >
>> > http://hadoop.apache.org/docs/r1.0.3/service_level_auth.html
>> >
>> >
>> > But does similar properties exists for HDFS side , where Namednode can
>> > see
>> > that this client is allowed to connect to cluster
>> >
>> > Thanks
>> >
>> >
>> >
>> > ----- Original Message -----
>> >
>> > From: Bertrand Dechoux
>> >
>> > Sent: 09/28/12 07:34 PM
>> >
>> > To: user@hadoop.apache.org
>> >
>> > Subject: Re: Securing cluster from access
>> >
>> >
>> > What you are looking for is not related to Hadoop in the end. It is how
>> > to
>> > restrict requests in a network.
>> > 'Firewall' is a broad term. iptables can allow you to do so quickly. You
>> > drop everything and then accept only from a set of IPs.
>> > You may receive answers using this mailing list but its purpose is not
>> > really to discuss about firewall solutions and configurations.
>> >
>> > Regards
>> >
>> > Bertrand
>> >
>> >
>> >
>> > On Fri, Sep 28, 2012 at 11:23 AM, Shin Chan <hadoop@gmx.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> We have 15 node cluster and right now we dont have Kerberos
>> >> implemented.
>> >>
>> >> But on urgent basis we want to secure the cluster.
>> >>
>> >> Right now anyone who know IP of Namenode can just download the Hadoop
>> >> jar
>> >> , configure xml files and say
>> >>
>> >> hadoop fs -ls /
>> >>
>> >> And he can see the data.
>> >>
>> >> How to stop this ?
>> >>
>> >> We have Hadoop 2.0 verison
>> >>
>> >> Do we have any configuration settings which we can change so that only
>> >> set
>> >> of users or set of IPs should be able to see the HDFS.
>> >>
>> >> We dont have firewall implemented yet outside cluster so that is not an
>> >> option.
>> >>
>> >> Thanks in advance for your help
>> >
>> >
>> >
>> >
>> > --
>> > Bertrand Dechoux
>> >
>> >
>> >
>> >
>> >
>> >
>> > Thanks and Regards ,
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Bertrand Dechoux

Mime
View raw message