hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik fang <fme...@gmail.com>
Subject Re: Implement directory/table level access control in HDFS
Date Thu, 22 Aug 2013 16:41:41 GMT
HDFS-5126 <https://issues.apache.org/jira/browse/HDFS-5126> has been
created for HDFS user impersonation, and I will develop a prototype in a
few weeks

Thanks,
Erik.fang




On Tue, Aug 20, 2013 at 3:07 PM, Erik fang <fmerik@gmail.com> wrote:

> Hi folks,
>
>
> HDFS has a POSIX-like permission model, using R,W,X and owner, group,
> other for access control. It is good most of the time, except for:
>
> 1. Data need to be shared among users
>
> group can be used for access control, and the users has to be in the same
> GROUP as the data. the GROUP here stand for the sharing relationship
> between users and data. If many sharing relationships exists, there are
> many groups. It is hard to manage.
>
> 2. Hive
>
> Hive use a table based access control model, user can have SELECT,
>  UPDATE, CREATE, DROP privileges on certain table, which means R/W
> permission in HDFS. However, Hive’s table based authorization doesn’t match
> HDFS’s POSIX-like model. For hive user accessing HDFS, Group permissions
> can be deployed, which introduces many groups, or big groups contains many
> sharing relationship.
>
> Inspired by RDBMS’s way of manage data, a  directory level access control
> based on authorized user impersonate can be implemented as a extension to
> POSIX-like permission model.
>
> it consist of:
>
> 1. ACLFileSystem
>
> 2. authorization manager: hold access control information and a shared
> secret with namenode
>
> 3. authenticator(embedded in namenode)
>
> Take hive as a example, owner of the data is user DW. The procedure is:
>
>  1. user submit a hive query or a hcatalog job to access DW’s data, we
> can get the read table/partition and write table/partition, and the
> corresponding hdfs path. Then a RPC call to authorization manager is
> invoked, send
>
> {user, tablename, table_path, w/r}
>
> 2. authorization manager do a authorization check to find whether it is
> allowed. If allowed, reply a encrypted tablepath:
>
> {realuser, encrypted(tablepath+w/r)}
>
> realuser here stand for the owner of the requested data
>
> 3. ACLFilesystem extends FileSystem and when a open(path) call is invoked
> , it replace the path to encrypted(tablepath+w/r) and invoke the namenode
> RPC call, such as
>
> open(realuser, encrypted(tablepath+w/r), null)
>
> If the user is requesting a partition path, the rpc call can be invoked as
>
> open(realuser, encrypted(tablepath+w/r), path_suffix)
>
> 4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with
> the shared secret to verify whether it is fake. If it is true, check w/r
> operation, join the  tablepath and path_suffix, and invoke the call as
> hdfspath owner, for example user DW.
>
>
> delegation token or something else can be used as the shared secret, and
> authorization manager can be integrated into hive metastore.
>
> In general, I propose a HDFS user impersonate mechanism and a
> authorization mechanism based on HDFS user impersonation.
>
> If the community is interested, I will file a jira for HDFS user
> impersonation and a jira for authorization manager soon.
>
>
> Thoughts?
>
> Thanks a lot
> Erik.fang
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message