hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
Date Wed, 22 Jan 2014 00:08:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878052#comment-13878052

Yongjun Zhang commented on HDFS-5767:

Thanks for giving more thoughts.

Your proposed solution resolves case 1 nicely, so we could have a unique mapping.  If "getent
passwd" would return the mapping from different database based on search order of different
databases, then the result would be the same as "picking the first one" (true?), otherwise,
your proposed solution would be better.

But I have a bit concern about case 2. Because I'm not sure whether it's a misconfiguration.
I didn't find enough info, but I guess if case 1 is possible (single name mapped to multiple
ids), case 2 is also possible, though I also hope it is misconfiguration. My thinking is,
assuming each database has only one mapping for each user, but different mapping in different
database, thus if we don't restrict search order, we will get into trouble. The search order
that NSS provides relieves us from this trouble. But if you combine two databases (as "getent
passwd" does), we will see both case 1 and case 2. On the other hand, if two database are
totally disjoint, then we won't have this discussion at all.
I guess more study is needed to confirm whether case-2 is misconfiguration. 

I'm asking another question here, I noticed that IdUserGroup class also provides API go getUserName
of given uid. I'm not sure whether this API will be called from different machine with different
uid for the same user. If it does, then we might get wrong user name back from this API. 
Say, userA is mapped to 1 in /etc/passwd, and 2 in ldap, we end up assign mapping <userA,
2>, is it possible some one will call this API with "1", and expect useA?

BTW, Actually when I observed this problem initially, I thought it's just that we are not
taking care of duplicated but same entries (exact duplicate entries), and I had a quick solution
to ignore this kind of duplicate, then I found that one user could be mapped to multiple userIds,
and same userId can be mapped to multiple user names. 


> Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-5767
>                 URL: https://issues.apache.org/jira/browse/HDFS-5767
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: nfs
>    Affects Versions: 2.3.0
>         Environment: With LDAP enabled
>            Reporter: Yongjun Zhang
>            Assignee: Brandon Li
> I'm seeing that the nfs implementation assumes unique <userName, userId> pair to
be returned by command  "getent paswd". That is, for a given userName, there should be a single
userId, and for a given userId, there should be a single userName.  The reason is explained
in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway can't start
with duplicate name or id on the host system.\n"
>       + "This is because HDFS (non-kerberos cluster) uses name as the only way to identify
a user or group.\n"
>       + "The host system with duplicated user/group name or id might work fine most of
the time by itself.\n"
>       + "However when NFS gateway talks to HDFS, HDFS accepts only user and group name.\n"
>       + "Therefore, same name means the same user or same group. To find the duplicated
names/ids, one can do:\n"
>       + "<getent passwd | cut -d: -f1,3> and <getent group | cut -d: -f1,3>
on Linux systms,\n"
>       + "<dscl . -list /Users UniqueID> and <dscl . -list /Groups PrimaryGroupID>
on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) Let's do
some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId "1".
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with different userIds,
and the same userId could be associated with different userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. The DEBUG_INFO
also stated that HDFS uses name as the only way to identify user/group. I'm filing this JIRA
for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you please comment?

> Thanks.

This message was sent by Atlassian JIRA

View raw message