hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Istvan Fajth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13322) fuse dfs - uid persists when switching between ticket caches
Date Fri, 26 Jul 2019 00:52:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893236#comment-16893236
] 

Istvan Fajth commented on HDFS-13322:
-------------------------------------

To document a new learning point regarding this change, I would like to add the following
information:

FUSE is figuring out the environment of the caller based on its pid that is passed on to the
FUSE code from the kernel in the FuseContext structure. With the pid fuse-dfs is turning to
/proc/(context->pid)/environ file to find the KRB5CCNAME environment variable.

After the change the connection cache keys are consists of a (username, kerberos ticket cache
path) pair if authentication is set to KERBEROS, so for every pair we hold a different connection
build with the ticket cache path, and we use that later on. With SIMPLE authentication, the
ticket cache path is presented in the pair as the \0 character always.

For this the ticket cache pathin case of KERBEROS authentication is being read from /proc/(context->pid)/environ
on every access.

In the Linux [proc file system man page|http://man7.org/linux/man-pages/man5/proc.5.html] the
following is written for /proc/[pid]/environ:
{quote}This file contains the *initial* environment that was set when the currently executing
program was started via execve(2).{quote}

This can lead to odd behaviors in case the access is not happening in a new process but it
is part of a process that exported the KRB5CCNAME environment variable. So for example in
a shell when executing the following commands, FUSE will not be able to read the KRB5CCNAME
variable from the /proc/(context->pid)/environ file:
{code:java}
$ export KRB5CCNAME=/tmp/myticketcache
$ echo "foo" > /mnt/hdfs/tmp/foo.txt{code}
This is because in this case echo is happening in the shell, and the shell's process id will
be there in context->pid, and the /proc/(context->pid)/environ file will not contain
the environment variable KRB5CCNAME as it is not part of the initial environment.

In the meantime the following will work because cp will be a new process which inherits the
environment from the current shell:
{code:java}
$ export KRB5CCNAME=/tmp/myticketcache
$ echo "foo" > /tmp/foo.txt
$ cp /tmp/foo.txt /mnt/hdfs/tmp/foo.txt{code}
 

To workaround this behaviour, the caller has to ensure that the initial environment of every
accessing process has the correct KRB5CCNAME set. So for example the echo example would work correctly
the following way:
{code:java}
$ export KRB5CCNAME=/tmp/myticketcache
$ /bin/sh
$ echo "foo" > /mnt/hdfs/tmp/foo.txt
$ exit{code}

> fuse dfs - uid persists when switching between ticket caches
> ------------------------------------------------------------
>
>                 Key: HDFS-13322
>                 URL: https://issues.apache.org/jira/browse/HDFS-13322
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fuse-dfs
>    Affects Versions: 2.6.0
>         Environment: Linux xxxxxx.xx.xx.xxx 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19
11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
>  
>            Reporter: Shoeb Sheyx
>            Assignee: Istvan Fajth
>            Priority: Minor
>             Fix For: 3.2.0
>
>         Attachments: HDFS-13322.001.patch, HDFS-13322.002.patch, HDFS-13322.003.patch,
TestFuse.java, TestFuse2.java, catter.sh, catter2.sh, perftest_new_behaviour_10k_different_1KB.txt,
perftest_new_behaviour_1B.txt, perftest_new_behaviour_1KB.txt, perftest_new_behaviour_1MB.txt,
perftest_old_behaviour_10k_different_1KB.txt, perftest_old_behaviour_1B.txt, perftest_old_behaviour_1KB.txt,
perftest_old_behaviour_1MB.txt, testHDFS-13322.sh, test_after_patch.out, test_before_patch.out
>
>
> The symptoms of this issue are the same as described in HDFS-3608 except the workaround
that was applied (detect changes in UID ticket cache) doesn't resolve the issue when multiple
ticket caches are in use by the same user.
> Our use case requires that a job scheduler running as a specific uid obtain separate
kerberos sessions per job and that each of these sessions use a separate cache. When switching
sessions this way, no change is made to the original ticket cache so the cached filesystem
instance doesn't get regenerated.
>  
> {{$ export KRB5CCNAME=/tmp/krb5cc_session1}}
> {{$ kinit user_a@domain}}
> {{$ touch /fuse_mount/tmp/testfile1}}
> {{$ ls -l /fuse_mount/tmp/testfile1}}
> {{ *-rwxrwxr-x 1 user_a user_a 0 Mar 21 13:37 /fuse_mount/tmp/testfile1*}}
> {{$ export KRB5CCNAME=/tmp/krb5cc_session2}}
> {{$ kinit user_b@domain}}
> {{$ touch /fuse_mount/tmp/testfile2}}
> {{$ ls -l /fuse_mount/tmp/testfile2}}
> {{ *-rwxrwxr-x 1 user_a user_a 0 Mar 21 13:37 /fuse_mount/tmp/testfile2*}}
> {{   }}{color:#d04437}*{{** expected owner to be user_b **}}*{color}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message