hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6
Date Tue, 08 Mar 2011 22:23:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004258#comment-13004258

Greg Roelofs commented on HADOOP-7156:

Doh!  FF crashed while I was replying, sigh.  Switching to e-mail:

bq. In my experience, we do a really bad job of keeping the wiki up to date. Greg, what do
you think?

I agree--we're much better at keeping the code up to date (frequently in
parallel across multiple branches ;-) ) than at keeping the wiki current.

I think the XML config text is fine; you could optionally prefix it with
"As of March 2011, systems known to ..." as a hint to users or future versions
of us to recheck it if significant time has passed.  The comment in NativeIO.c
probably should be modified; perhaps "monitor used for working around a bug
in the sssd security daemon, which was observed in getpwuid_r() on RHEL 6.0,"
or words to that effect.  (Need not be that verbose, of course.)

I also agree with Eli that we can leave the workaround disabled for tests.
It might be worthwhile to add a log message at the start that "this test
may fail (crash) with an invalid free() on some systems; see HADOOP-7156
for details."  Again, feel free to word it however you wish.

Trivial grammo:  "workaround" is a noun; the verb form is "work around"
(similar to layout, backup, setup, cleanup, checkin, cutoff, etc.).  The
various variable names would be more proper if they reflected this (e.g.,
[or workAroundNonThreadsafePasswdCalls, since you're using "threadsafe"
as a single word elsewhere]), but I won't fuss if you leave them as is.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>         Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL
6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available,
since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get
the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer:
0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message