Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1B28DB89 for ; Mon, 13 Aug 2012 16:21:32 +0000 (UTC) Received: (qmail 78745 invoked by uid 500); 13 Aug 2012 16:21:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 78644 invoked by uid 500); 13 Aug 2012 16:21:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 78637 invoked by uid 99); 13 Aug 2012 16:21:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 16:21:27 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chen.song.82@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 16:21:21 +0000 Received: by yenm12 with SMTP id m12so3935451yen.35 for ; Mon, 13 Aug 2012 09:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GjYLw2v57yMca8pfRT7VS1b1GOI9VsMjYfVGILXz3Qc=; b=Dx09iDRXvos++xbeaFNs0UFsOV1jU1znNhVZi2GAtyDErfEWQsX6bdFN1f9lyEwW/q sMsIBn072Pjyk3XxWWaPtgkIawSb51+R98iRa1o7kh2RnpkmCyUBOLZrvREHaNA7Tmv1 dpCL/xwHb3QoDebZbVO7+P9xFhVM+thjuNas36xbvV/AP9Raf47ZccxrCs+7hhl22yRo duioIYlNlnk4BnrMqI7QZrzEA1b00pVMJgW++tlZrEQrba9VG2h2ACJU5yUIrvICxWeo AWymU0U9nyG+pFqrzYCu3a3f/+LXvSKsr71SgsiS40U3HFjJVNNcx/fP7cT6odOFybDJ Hvwg== MIME-Version: 1.0 Received: by 10.42.155.135 with SMTP id u7mr8265258icw.25.1344874860651; Mon, 13 Aug 2012 09:21:00 -0700 (PDT) Received: by 10.50.151.207 with HTTP; Mon, 13 Aug 2012 09:21:00 -0700 (PDT) Date: Mon, 13 Aug 2012 12:21:00 -0400 Message-ID: Subject: group assignment on HDFS from Hadoop and Hive From: Chen Song To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8c5a5624a904c72814b0 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e8c5a5624a904c72814b0 Content-Type: text/plain; charset=ISO-8859-1 I am wondering how Hadoop assign groups when dirs/files are being created by a user and below are some tests I have done. In my cluster, group hadoop is configured as the supergroup. > hadoop fs -ls /tmp drwxrwxrwx - abc hadoop 0 2012-08-10 23:02 /tmp/abc drwxrwxrwx - def other_group 0 2012-08-10 23:02 /tmp/def > groups apache apache: apache wheel > sudo -u apache hadoop fs -put somefile /tmp/abc > hadoop fs -ls /tmp/abc -rw-rw-r-- 3 apache hadoop 120962 2012-08-13 16:03 /tmp/abc/somefile > sudo -u apache hadoop fs -put somefile /tmp/def > hadoop fs -lsr /tmp/def -rw-rw-r-- 3 apache other_group 120962 2012-08-13 16:03 /tmp/abc/somefile *Based on the experiments above, it looks like the file got pushed on hdfs is always inheriting its group from the parent including folder. Is that always the case?* A follow-up question on one finding in Hive is: when executing a query to overwrite a table (or a partition within a table), the newly written overriding directory always end up as belong to HDFS's supergroup, no matter what context it is running from 1. The user who is executing the hive query 2. The group where the user belongs to 3. The group the parent table directory is belonging to. *Is it always expected in Hive?* For example, table A is stored on /path/A and is partitioned on column dh. /path/A is with group *other_group*. After running *insert overwrite A partition (dh = "12") select column list from ... where ...* /path/A/12 ends up with *hadoop* as always. This has contradicts to the assumption of inheritance I have drawn out above. Any thoughts would be appreciated. Thanks Chen --90e6ba6e8c5a5624a904c72814b0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am wondering how Hadoop assign groups when dirs/files are being created b= y a user and below are some tests I have done. In my cluster, group hadoop = is configured as the supergroup.

> hadoop fs -ls /tmp=
drwxrwxrwx =A0 - abc hadoop =A0 =A0 =A0 =A0 =A00 2012-08-10 23:02 /tmp= /abc
drwxrwxrwx =A0 - def other_group =A0 =A0 =A0 =A0 =A00 2012-0= 8-10 23:02 /tmp/def

> groups apache
<= div>apache: apache wheel

> sudo -u apache hadoop fs -put somefile /tmp/= abc
> hadoop fs -ls /tmp/abc
-rw-rw-r-- =A0 3 apache= hadoop =A0 =A0 120962 2012-08-13 16:03 /tmp/abc/somefile

> sudo -u apache hadoop fs -put somefile /tmp/def
>=A0= hadoop fs -lsr /tmp/def
-rw-rw-r-- =A0 3 apache=A0other_group=A0 = =A0 =A0120962 2012-08-13 16:03=A0/tmp/abc/somefile

Based on the experiments above, it looks like the file got pushed on hd= fs is always inheriting its group from the parent including folder. Is that= always the case?

A follow-up question on one finding in Hive is: when ex= ecuting a query to overwrite a table (or a partition within a table), the n= ewly written overriding directory always end up as belong to HDFS's sup= ergroup, no matter what context it is running from
1. The user who is executing the hive query
2. The group whe= re the user belongs to
3. The group the parent table directory is= belonging to.
Is it always expected in Hive?

For example, table A is stored on /path/A and is partitioned= on column dh.=A0/path/A is with group=A0other_group.
Afte= r running insert overwrite A partition (dh =3D "12") select co= lumn list from ... where ...

/path/A/12 ends up with hadoop=A0as always. This= has contradicts to the assumption of inheritance I have drawn out above. A= ny thoughts would be appreciated.

Thanks
Chen




--90e6ba6e8c5a5624a904c72814b0--