Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B9D510F4D for ; Mon, 1 Jul 2013 22:13:10 +0000 (UTC) Received: (qmail 41544 invoked by uid 500); 1 Jul 2013 22:13:09 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 41495 invoked by uid 500); 1 Jul 2013 22:13:09 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 41485 invoked by uid 99); 1 Jul 2013 22:13:09 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jul 2013 22:13:09 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 9A0701CD5F1; Mon, 1 Jul 2013 22:13:00 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6765174291603313799==" MIME-Version: 1.0 Subject: Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance) From: "Sushanth Sowmyan" To: "Sushanth Sowmyan" , "hive" , "Chaoyu Tang" Date: Mon, 01 Jul 2013 22:13:00 -0000 Message-ID: <20130701221300.26284.14900@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Sushanth Sowmyan" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/12050/ X-Sender: "Sushanth Sowmyan" References: <20130701182654.26284.89250@reviews.apache.org> In-Reply-To: <20130701182654.26284.89250@reviews.apache.org> Reply-To: "Sushanth Sowmyan" --===============6765174291603313799== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit > On July 1, 2013, 6:26 p.m., Sushanth Sowmyan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128 > > > > > > I understand from javadoc that FileStatus.isDirectory() is supposed to be the new way of calling and .isDir() is deprecated, but I wanted to point out that hive trunk (with no hadoop revision args) does not even compile for me with isDirectory() here. > > > > Also, given that .isDir() is used all over the place in hive, for the sake of consistency, would this break functionality in any significant way if you use it here? To add more details, this patch only works if you are compiling against Hadoop 2.x, and does not if you are compiling against 0.20.x or 1.x. If you need to use isDirectory() for some functionality reason, you should add the appropriate shims for it. Without that, this should be written as isDir(). - Sushanth ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22617 ----------------------------------------------------------- On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/12050/ > ----------------------------------------------------------- > > (Updated June 22, 2013, 8:43 p.m.) > > > Review request for hive. > > > Bugs: HIVE- and HIVE-3756 > https://issues.apache.org/jira/browse/HIVE- > https://issues.apache.org/jira/browse/HIVE-3756 > > > Repository: hive-git > > > Description > ------- > > Problems: > 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. > 2. Beside the group issue, the read/write permission mode is also not inherited > 3. Same problem affects the partition files (see HIVE-3094) > > Cause: > The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. > > Solution: > After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 > > Diff: https://reviews.apache.org/r/12050/diff/ > > > Testing > ------- > > The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. > {code} > hive> dfs -ls -d /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive > > hive> create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db > > hive> use tester1; > hive> create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 > > hive> load data local inpath '/home/tester1/tst1.input' into table tst1; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input > > hive> create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2 > > hive> insert overwrite table tst2 select * from tst1; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0 > > hive> create table tester1.tst3(col2 string) PARTITIONED BY (col1 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 /user/tester1/hive/tester1.db/tst3 > > hive> set hive.exec.dynamic.partition.mode=nonstrict; > hive> insert overwrite table tester1.tst3 partition (col1) select t1.col2, t1.col1 from tst1 t1; > hive> dfs -ls -R /user/tester1/hive; > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:27 /user/tester1/hive/tester1.db > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 > -rw-rw---- 3 tester1 testgroup123 168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2 > -rw-rw---- 3 tester1 testgroup123 291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=1111 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=1111/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=2222 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=2222/000000_0 > drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=3333 > -rw-rw---- 3 tester1 testgroup123 51 2013-06-22 13:28 /user/tester1/hive/tester1.db/tst3/col1=3333/000000_0 > {code} > > > Thanks, > > Chaoyu Tang > > --===============6765174291603313799==--