Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87027D3E9 for ; Tue, 2 Oct 2012 16:53:09 +0000 (UTC) Received: (qmail 8139 invoked by uid 500); 2 Oct 2012 16:53:09 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 8105 invoked by uid 500); 2 Oct 2012 16:53:09 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 8093 invoked by uid 99); 2 Oct 2012 16:53:09 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 16:53:09 +0000 Date: Wed, 3 Oct 2012 03:53:09 +1100 (NCT) From: "Eli Collins (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <621220622.155117.1349196789130.JavaMail.jiratomcat@arcas> In-Reply-To: <1631159498.125177.1348608007445.JavaMail.jiratomcat@arcas> Subject: [jira] [Comment Edited] (HADOOP-8845) When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467853#comment-13467853 ] Eli Collins edited comment on HADOOP-8845 at 10/3/12 3:52 AM: -------------------------------------------------------------- Harsh, Per the discussion, my earlier comment was incorrect {{/tmp/testdir/\*/testfile}} should *not* match {{/tmp/testdir/testfile}}. Let's add a test for that if we don't have one. bq. The parts I've changed this under, try to fetch "parents", which can't mean anything but directories AFAICT. I took another look, and that appears to be true for FileSystem, however not FileContext which also needs to handle symlinks. Unfortunately it looks like this glob handling code was duplicated, so the equivalent change needs to be made to the same code in FileContext, which file a jira for sharing it across FileSystem and FileContext? Can do that in a separate change. was (Author: eli): Harsh, Per the discussion, my earlier comment was incorrect "/tmp/testdir/*/testfile" should *not* match "/tmp/testdir/testfile". Let's add a test for that if we don't have one. bq. The parts I've changed this under, try to fetch "parents", which can't mean anything but directories AFAICT. I took another look, and that appears to be true for FileSystem, however not FileContext which also needs to handle symlinks. Unfortunately it looks like this glob handling code was duplicated, so the equivalent change needs to be made to the same code in FileContext, which file a jira for sharing it across FileSystem and FileContext? Can do that in a separate change. > When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException > ------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-8845 > URL: https://issues.apache.org/jira/browse/HADOOP-8845 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Affects Versions: 2.0.0-alpha > Reporter: Harsh J > Assignee: Harsh J > Labels: glob > Attachments: HADOOP-8845.patch, HADOOP-8845.patch, HADOOP-8845.patch > > > A brief description from my colleague Stephen Fritz who helped discover it: > {code} > [root@node1 ~]# su - hdfs > -bash-4.1$ echo "My Test String">testfile <-- just a text file, for testing below > -bash-4.1$ hadoop dfs -mkdir /tmp/testdir <-- create a directory > -bash-4.1$ hadoop dfs -mkdir /tmp/testdir/1 <-- create a subdirectory > -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/1/testfile <-- put the test file in the subdirectory > -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/testfile <-- put the test file in the directory > -bash-4.1$ hadoop dfs -lsr /tmp/testdir > drwxr-xr-x - hdfs hadoop 0 2012-09-25 06:52 /tmp/testdir/1 > -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile > -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile > All files are where we expect them...OK, let's try reading > -bash-4.1$ hadoop dfs -cat /tmp/testdir/testfile > My Test String <-- success! > -bash-4.1$ hadoop dfs -cat /tmp/testdir/1/testfile > My Test String <-- success! > -bash-4.1$ hadoop dfs -cat /tmp/testdir/*/testfile > My Test String <-- success! > Note that we used an '*' in the cat command, and it correctly found the subdirectory '/tmp/testdir/1', and ignore the regular file '/tmp/testdir/testfile' > -bash-4.1$ exit > logout > [root@node1 ~]# su - testuser <-- lets try it as a different user: > [testuser@node1 ~]$ hadoop dfs -lsr /tmp/testdir > drwxr-xr-x - hdfs hadoop 0 2012-09-25 06:52 /tmp/testdir/1 > -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile > -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile > [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/testfile > My Test String <-- good > [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/1/testfile > My Test String <-- so far so good > [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/*/testfile > cat: org.apache.hadoop.security.AccessControlException: Permission denied: user=testuser, access=EXECUTE, inode="/tmp/testdir/testfile":hdfs:hadoop:-rw-r--r-- > {code} > Essentially, we hit a ACE with access=EXECUTE on file /tmp/testdir/testfile cause we tried to access the /tmp/testdir/testfile/testfile as a path. This shouldn't happen, as the testfile is a file and not a path parent to be looked up upon. > {code} > 2012-09-25 07:24:27,406 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 8020, call getFileInfo(/tmp/testdir/testfile/testfile) > {code} > Surprisingly the superuser avoids hitting into the error, as a result of bypassing permissions, but that can be looked up on another JIRA - if it is fine to let it be like that or not. > This JIRA targets a client-sided fix to not cause such /path/file/dir or /path/file/file kinda lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira