hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8088) User-group mapping cache incorrectly does negative caching on transient failures
Date Fri, 16 Mar 2012 17:07:40 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231394#comment-13231394
] 

Kihwal Lee commented on HADOOP-8088:
------------------------------------

We use a refresh cycle of 4 hrs. It becomes a serious issue when headless users associated
with tight SLA jobs get in there. This behavior is not a hypothetical one. We've seen this
in production. 

I do agree with you on the problem of having too many configs, but we often end up going this
route in (sometimes justifiable) fear of breaking compatibility even if the bug is clearly
there and in need of fix. I would appreciate your further input on how we can address the
issue with the minimum negative impact. 
                
> User-group mapping cache incorrectly does negative caching on transient failures
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-8088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8088
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.20.205.0, 0.24.0, 0.23.1, 1.0.0, 1.1.0
>            Reporter: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 0.23.2
>
>         Attachments: hadoop-8088-branch-1.patch, hadoop-8088-trunk.patch, hadoop-8088-trunk.patch
>
>
> We've seen a case where some getGroups() calls fail when the ldap server or the network
is having transient failures. Looking at the code, the shell-based and the JNI-based implementations
swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds
this likely empty list into the mapping cache for the user. This will function as negative
caching until the cache expires. I don't think we want negative caching here, but even if
we do, it should be intelligent enough to distinguish transient failures from ENOENT. The
log message in the jni-based impl also needs an improvement. It should print what exception
it encountered instead of just saying one happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message