hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <ctang...@gmail.com>
Subject Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
Date Fri, 19 Jul 2013 18:54:25 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/
-----------------------------------------------------------

(Updated July 19, 2013, 6:54 p.m.)


Review request for hive.


Changes
-------

Further change from previous version is to preserve the permission/group combination for the
insert overwrite case etc. The added change is to get the permission/group information of
a dir/file before it gets deleted in the replace, then apply them back to replaced dir/file.

Further tests:
1. create 2 tables under db tester1 with the permission/group as following, the tables' dir
inherit their parent db tester1.db permission/group combination:
drwxrwx---   - ctang staff        170 2013-07-19 14:22 /user/tester1/hive/tester1.db
drwxrwx---   - ctang staff         68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst1
drwxrwx---   - ctang staff         68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst2

2. change tst2: dfs -chmod 700 /user/tester1/hive/tester1.db/tst2;
dfs -ls -R /user/tester1/hive;
drwxrwx---   - ctang staff        170 2013-07-19 14:22 /user/tester1/hive/tester1.db
drwxrwx---   - ctang staff        136 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1
-rwxrwx---   1 ctang staff        168 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1/tst1.input
drwx------   - ctang staff         68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst2

3. insert overwrite table tst2 select * from tst1;
4. dfs -ls -R /user/tester1/hive;
drwxrwx---   - ctang staff        170 2013-07-19 14:25 /user/tester1/hive/tester1.db
drwxrwx---   - ctang staff        136 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1
-rwxrwx---   1 ctang staff        168 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1/tst1.input
drwx------   - ctang staff        136 2013-07-19 14:25 /user/tester1/hive/tester1.db/tst2
-rwx------   1 ctang staff        291 2013-07-19 14:25 /user/tester1/hive/tester1.db/tst2/000000_0
==
The permissions of /user/tester1/hive/tester1.db/tst2 and /user/tester1/hive/tester1.db/tst2/000000_0
are same as those changed in step 2, instead from /user/tester1/hive/tester1.db


Bugs: HIVE- and HIVE-3756
    https://issues.apache.org/jira/browse/HIVE-
    https://issues.apache.org/jira/browse/HIVE-3756


Repository: hive-git


Description
-------

Problems:
1. When doing load data or insert overwrite to a table, the data files under database/table
directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756.
2. Beside the group issue, the read/write permission mode is also not inherited
3. Same problem affects the partition files (see HIVE-3094)

Cause:
The task results (from load data or insert overwrite) are initially stored in scratchdir and
then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g.
LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including
group and mode) which are determined by scratchdir permission or umask. If the scratchdir
has different permissions from those of warehouse table directories, the problem occurs.

Solution:
After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination
parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here
I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename
used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same
scratchdir in the middle of task processing. It looks to me not necessary since they are temp
files and also probably access protected by top scratchdir mode 700 (HIVE-4487).


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d 

Diff: https://reviews.apache.org/r/12050/diff/


Testing
-------

The following cases tested that all created subdirs/files inherit their parents' permission
mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite;
5) partitions.
{code}
hive> dfs -ls -d /user/tester1/hive;                                                  
                    
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:20 /user/tester1/hive

hive> create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db';
hive> dfs -ls -R /user/tester1/hive;                                                  
                     
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:21 /user/tester1/hive/tester1.db

hive> use tester1;
hive>  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' STORED AS TEXTFILE;
hive> dfs -ls -R /user/tester1/hive;                                                  
                                 
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:22 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1

hive>  load data local inpath '/home/tester1/tst1.input' into table tst1;             
                                 
hive> dfs -ls -R /user/tester1/hive;                                     
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:22 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1
-rw-rw----   3 tester1 testgroup123        168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input

hive> create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' STORED AS SEQUENCEFILE;
hive> dfs -ls -R /user/tester1/hive;                                                  
                                 
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:24 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1
-rw-rw----   3 tester1 testgroup123        168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2

hive> insert overwrite table tst2 select * from tst1;
hive> dfs -ls -R /user/tester1/hive;                 
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:25 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1
-rw-rw----   3 tester1 testgroup123        168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2
-rw-rw----   3 tester1 testgroup123        291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0

hive> create table tester1.tst3(col2 string) PARTITIONED BY (col1 int) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive> dfs -ls -R /user/tester1/hive;                                                  
                                 
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:27 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1
-rw-rw----   3 tester1 testgroup123        168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2
-rw-rw----   3 tester1 testgroup123        291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:27 /user/tester1/hive/tester1.db/tst3

hive> set hive.exec.dynamic.partition.mode=nonstrict;                                 
                                 
hive> insert overwrite table tester1.tst3 partition (col1) select t1.col2, t1.col1 from
tst1 t1;
hive> dfs -ls -R /user/tester1/hive;                                                  
         
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:27 /user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1
-rw-rw----   3 tester1 testgroup123        168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2
-rw-rw----   3 tester1 testgroup123        291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=1111
-rw-rw----   3 tester1 testgroup123         51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=1111/000000_0
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=2222
-rw-rw----   3 tester1 testgroup123         51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=2222/000000_0
drwxrwx---   - tester1 testgroup123          0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=3333
-rw-rw----   3 tester1 testgroup123         51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=3333/000000_0
{code}


Thanks,

Chaoyu Tang


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message