Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00BC31037A for ; Mon, 17 Mar 2014 18:43:59 +0000 (UTC) Received: (qmail 49632 invoked by uid 500); 17 Mar 2014 18:43:48 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 49457 invoked by uid 500); 17 Mar 2014 18:43:47 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 49237 invoked by uid 500); 17 Mar 2014 18:43:44 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 49102 invoked by uid 99); 17 Mar 2014 18:43:42 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2014 18:43:42 +0000 Date: Mon, 17 Mar 2014 18:43:42 +0000 (UTC) From: "Joe Rao (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6489) Data loaded with LOAD DATA LOCAL INPATH has incorrect group ownership MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Rao updated HIVE-6489: -------------------------- Description: Data uploaded by via the Hive client with the "LOAD DATA LOCAL INPATH" method will have group ownership of the hdfs://tmp/hive- instead of the group ownership of the table directory. The group ownership of the hdfs://tmp/hive- is, by default, the group that the user running the hadoop daemons run under. This means that, on a Hadoop system with default file permissions of 770, any data loaded to hive via the LOAD DATA LOCAL INPATH method by one user cannot be seen by another user in the same group until the group ownership is manually changed in Hive's internal directory, or the group ownership is manually changed on hdfs://tmp/hive-. This problem is not present with the LOAD DATA INPATH method, or by using regular HDFS loads. Steps to reproduce the problem on a pseudodistributed Hadoop cluster: - In hdfs-site.xml, modify the umask to 007 (meaning that default permissions on files are 770). The property changes names in Hadoop 2.0 but used to be called "dfs.umaskmode". - Restart hdfs - Create a group called "testgroup". - Create two users that have testgroup as their primary group. Call them "testuser1" and "testuser2" - Create a test file containing "Hello World" and call it "test.txt". It should be stored on the local filesystem. - Create a table called "testtable" in Hive using testuser1. Give it a single string column, textfile format, comma delimited fields. - Have testuser1 use the LOAD DATA LOCAL INPATH command to load "test.txt" into testtable. - Attempt to read testtable using testuser2. The read will fail on a permissions error, when it should not. - Examine the contents of the hdfs://apps/hive/warehouse/testtable directory. The file will belong to the "hadoop" or "users" or analogous group, instead of the correct group "testgroup". It will have correct permissions of 770. - Change the group ownership of the folder "hdfs://tmp/hive-testuser1" to "testgroup". - Repeat the data load. testuser2 will now be able to correctly read the data, and the file will have the correct group ownership. was: Data uploaded by via the Hive client with the "LOAD DATA LOCAL INPATH" method will have group ownership of the hdfs://tmp/hive- instead of the primary group that belongs to. The group ownership of the hdfs://tmp/hive- is, by default, the group that the user running the hadoop daemons run under. This means that, on a Hadoop system with default file permissions of 770, any data loaded to hive via the LOAD DATA LOCAL INPATH method by one user cannot be seen by another user in the same group until the group ownership is manually changed in Hive's internal directory, or the group ownership is manually changed on hdfs://tmp/hive-. This problem is not present with the LOAD DATA INPATH method, or by using regular HDFS loads. Steps to reproduce the problem on a pseudodistributed Hadoop cluster: - In hdfs-site.xml, modify the umask to 007 (meaning that default permissions on files are 770). The property changes names in Hadoop 2.0 but used to be called "dfs.umaskmode". - Restart hdfs - Create a group called "testgroup". - Create two users that have testgroup as their primary group. Call them "testuser1" and "testuser2" - Create a test file containing "Hello World" and call it "test.txt". It should be stored on the local filesystem. - Create a table called "testtable" in Hive using testuser1. Give it a single string column, textfile format, comma delimited fields. - Have testuser1 use the LOAD DATA LOCAL INPATH command to load "test.txt" into testtable. - Attempt to read testtable using testuser2. The read will fail on a permissions error, when it should not. - Examine the contents of the hdfs://apps/hive/warehouse/testtable directory. The file will belong to the "hadoop" or "users" or analogous group, instead of the correct group "testgroup". It will have correct permissions of 770. - Change the group ownership of the folder "hdfs://tmp/hive-testuser1" to "testgroup". - Repeat the data load. testuser2 will now be able to correctly read the data, and the file will have the correct group ownership. > Data loaded with LOAD DATA LOCAL INPATH has incorrect group ownership > --------------------------------------------------------------------- > > Key: HIVE-6489 > URL: https://issues.apache.org/jira/browse/HIVE-6489 > Project: Hive > Issue Type: Bug > Components: Import/Export > Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 > Environment: OS and hardware are irrelevant. Tested and reproduced on multiple configurations, including SLES, RHEL, VM, Teradata Hadoop Appliance, HDP 1.1, HDP 1.3.2, HDP 2.0. > Reporter: Joe Rao > Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Data uploaded by via the Hive client with the "LOAD DATA LOCAL INPATH" method will have group ownership of the hdfs://tmp/hive- instead of the group ownership of the table directory. The group ownership of the hdfs://tmp/hive- is, by default, the group that the user running the hadoop daemons run under. This means that, on a Hadoop system with default file permissions of 770, any data loaded to hive via the LOAD DATA LOCAL INPATH method by one user cannot be seen by another user in the same group until the group ownership is manually changed in Hive's internal directory, or the group ownership is manually changed on hdfs://tmp/hive-. This problem is not present with the LOAD DATA INPATH method, or by using regular HDFS loads. > Steps to reproduce the problem on a pseudodistributed Hadoop cluster: > - In hdfs-site.xml, modify the umask to 007 (meaning that default permissions on files are 770). The property changes names in Hadoop 2.0 but used to be called "dfs.umaskmode". > - Restart hdfs > - Create a group called "testgroup". > - Create two users that have testgroup as their primary group. Call them "testuser1" and "testuser2" > - Create a test file containing "Hello World" and call it "test.txt". It should be stored on the local filesystem. > - Create a table called "testtable" in Hive using testuser1. Give it a single string column, textfile format, comma delimited fields. > - Have testuser1 use the LOAD DATA LOCAL INPATH command to load "test.txt" into testtable. > - Attempt to read testtable using testuser2. The read will fail on a permissions error, when it should not. > - Examine the contents of the hdfs://apps/hive/warehouse/testtable directory. The file will belong to the "hadoop" or "users" or analogous group, instead of the correct group "testgroup". It will have correct permissions of 770. > - Change the group ownership of the folder "hdfs://tmp/hive-testuser1" to "testgroup". > - Repeat the data load. testuser2 will now be able to correctly read the data, and the file will have the correct group ownership. -- This message was sent by Atlassian JIRA (v6.2#6252)