hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <>
Subject [jira] [Commented] (HIVE-3756) "LOAD DATA" does not honor permission inheritence
Date Thu, 11 Jul 2013 21:35:48 GMT


Sushanth Sowmyan commented on HIVE-3756:

I have a few more thoughts on this. Let's walk through an example:

Let's say Parent Dir d1 has permission/group combination A.
Let's say directory d2 inside Parent Dir has permission/group combination B.

In the case of non-partitioned tables, d1 will be the database/warehouse dir, and d2 the table
In the case of partitioned tables, d1 will be the table directory and d2 the appropriate partition

If we did not have the flag to inherit permissions on, then whatever data is loaded, be it
files inside d2 (as during a load operation) or replacing d2 and everything in it (as during
an insert overwrite operation), will have yet another permission/group combination C, which
is a function of the user's current umask and the user's default group

The purpose behind the subdir inherit permissions flag is to make this behaviour go away,
and to be able to use the parent dir's permissions/group when possible. So far, so good.

Let's say, for purposes of this entire discussion from now onwards, the flag to inherit permissions
is on.

Now, if we load data into d2, without using overwrite, files inside d2 get permission B.
If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 takes on d1's
permissions, and so do the files inside, thus resulting in d2 and files inside d2 having permissions/group
combination A.


While this behaviour is consistent, I find that from a user's perspective, if they create
a table (say unpartitioned), then chmod/chgrp it to B, and then they try to load data into
it using an Insert-Overwrite, then they still expect that they're only overwriting data inside
the table dir, and their expectation is that the table still have permissions/group-combination
B. They don't want it to be replaced by "A", the parent db dir's permissions/group , and they
don't want "C", the umask/current-user-default-group.

Now, as to whether this requires a new flag that overrides "hive.warehouse.subdir.inherit.perms"
or whether they want "hive.warehouse.subdir.inherit.perms" to work in this way is still up
for discussion, but there is now need for an additional requirement, that of the following:

"If the directory being moved in already exists, and will be deleted so that this can be placed,
then instead of going with the parent permissions, it should go with the previous dir's permissions."


This can be a separate jira if people feel like it should be, but I think it's also a minor
modification of this current jira.
> "LOAD DATA" does not honor permission inheritence
> -------------------------------------------------
>                 Key: HIVE-3756
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Authorization, Security
>    Affects Versions: 0.9.0
>            Reporter: Johndee Burks
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-3756_1.patch, HIVE-3756.patch
> When a "LOAD DATA" operation is performed the resulting data in hdfs for the table does
not maintain permission inheritance. This remains true even with the "hive.warehouse.subdir.inherit.perms"
set to true.
> The issue is easily reproducible by creating a table and loading some data into it. After
the load is complete just do a "dfs -ls -R" on the warehouse directory and you will see that
the inheritance of permissions worked for the table directory but not for the data. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message