Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE65F10F92 for ; Tue, 14 Jan 2014 21:11:42 +0000 (UTC) Received: (qmail 32720 invoked by uid 500); 14 Jan 2014 21:11:22 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 32595 invoked by uid 500); 14 Jan 2014 21:11:21 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32559 invoked by uid 99); 14 Jan 2014 21:11:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jan 2014 21:11:20 +0000 Date: Tue, 14 Jan 2014 21:11:20 +0000 (UTC) From: "Yongjun Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-5767: -------------------------------- Description: I'm seeing that the nfs implementation assumes unique pair to be returned by command "getent paswd". That is, for a given userName, there should be a single userId, and for a given userId, there should be a single userName. The reason is explained in the following message: private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway can't start with duplicate name or id on the host system.\n" + "This is because HDFS (non-kerberos cluster) uses name as the only way to identify a user or group.\n" + "The host system with duplicated user/group name or id might work fine most of the time by itself.\n" + "However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n" + "Therefore, same name means the same user or same group. To find the duplicated names/ids, one can do:\n" + " and on Linux systms,\n" + " and on MacOS."; This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do some examination: What exist in /etc/passwd: $ more /etc/passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh $ more /etc/passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh The above result says userName "bin" has userId "2", and "daemon" has userId "1". What we can see with "getent passwd" command due to LDAP: $ getent passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh bin:x:1:1:bin:/bin:/sbin/nologin $ getent passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh daemon:x:2:2:daemon:/sbin:/sbin/nologin We can see that there are multiple entries for the same userName with different userIds, and the same userId could be associated with different userNames. So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA for a solution. Hi [~brandonli], since you implemented most of the nfs feature, would you please comment? Thanks. was: I'm seeing that the nfs implementation assumes unique pair to be returned by command "getent paswd". That is, for a given userName, there should be a single userId, and for a given userId, there should be a single userName. The reason is explained in the following message: private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway can't start with duplicate name or id on the host system.\n" + "This is because HDFS (non-kerberos cluster) uses name as the only way to identify a user or group.\n" + "The host system with duplicated user/group name or id might work fine most of the time by itself.\n" + "However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n" + "Therefore, same name means the same user or same group. To find the duplicated names/ids, one can do:\n" + " and on Linux systms,\n" + " and on MacOS."; This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do some examination: What exist in /etc/passwd: $ more /etc/passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh $ more /etc/passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh The above result says userName "bin" has userId "2", and "daemon" has userId "1". What we can see with "getent passwd" command due to LDAP: $ getent passwd | grep ^bin bin:x:2:2:bin:/bin:/bin/sh bin:x:1:1:bin:/bin:/sbin/nologin $ getent passwd | grep ^daemon daemon:x:1:1:daemon:/usr/sbin:/bin/sh daemon:x:2:2:daemon:/sbin:/sbin/nologin We can see that there are multiple entries for the same userName with different userIds, and the same userId could be associated with different userNames. So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA for a solution. Hi Brandon, since you implemented most of the nfs feature, would you please comment? Thanks. > Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes > -------------------------------------------------------------------------------------------- > > Key: HDFS-5767 > URL: https://issues.apache.org/jira/browse/HDFS-5767 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs > Affects Versions: 2.3.0 > Environment: With LDAP enabled > Reporter: Yongjun Zhang > > I'm seeing that the nfs implementation assumes unique pair to be returned by command "getent paswd". That is, for a given userName, there should be a single userId, and for a given userId, there should be a single userName. The reason is explained in the following message: > private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway can't start with duplicate name or id on the host system.\n" > + "This is because HDFS (non-kerberos cluster) uses name as the only way to identify a user or group.\n" > + "The host system with duplicated user/group name or id might work fine most of the time by itself.\n" > + "However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n" > + "Therefore, same name means the same user or same group. To find the duplicated names/ids, one can do:\n" > + " and on Linux systms,\n" > + " and on MacOS."; > This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do some examination: > What exist in /etc/passwd: > $ more /etc/passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > $ more /etc/passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > The above result says userName "bin" has userId "2", and "daemon" has userId "1". > > What we can see with "getent passwd" command due to LDAP: > $ getent passwd | grep ^bin > bin:x:2:2:bin:/bin:/bin/sh > bin:x:1:1:bin:/bin:/sbin/nologin > $ getent passwd | grep ^daemon > daemon:x:1:1:daemon:/usr/sbin:/bin/sh > daemon:x:2:2:daemon:/sbin:/sbin/nologin > We can see that there are multiple entries for the same userName with different userIds, and the same userId could be associated with different userNames. > So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA for a solution. > Hi [~brandonli], since you implemented most of the nfs feature, would you please comment? > Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)