hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Udit Mehta <ume...@groupon.com>
Subject Best Hive Authorization Model for Shared data
Date Tue, 12 Apr 2016 22:42:21 GMT
Hi all,

I wanted to understand what authorization model is most suitable for a
production environment where most of the data is shared between multiple
teams and users.
I know this is would depend more on the use case but I cant seem to figure
out the best model for our use:

We have data that is owned by a certain process (R/W access for that user)
while other users only have Read access to that data. We have a lot of
instances when users would want to create external tables pointing to this
data. We tried the following 3 auth models:

1. *Default Authorization model*: This we think is less secure and any user
can grant himself access to create/modify tables and databases even where
they are not supposed to. We would want to have much tighter security than
this model provides.

2. *Storage Based Authorization*: While this helps us by preventing users
from modifying metadata by checking the HDFS permissions of the underlying
directories, it prevents our most important use case of letting users
create *external *tables on data they dont have write access to. I would
assume external tables wont actually delete the data when dropping
tables/partitions so this operation should be allowed. But because it is
not, even this authorization model does not meet our use case.

3. *Sql Standard Based Authorization: *This does give us fine-grained
control over which users can perform specific commands, but when it comes
to creating external tables, even this authorization scheme seems to use
the filesystem's permissions.

So overall all 3 models didnt seem to fulfill our requirement here which I
think would be a fairly common one. I want to know how other users manage
security on Hive or If i am missing something.

Thanks in advance,
Udit

Mime
View raw message