hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7798) Checkpointing failure caused by shared KerberosAuthenticator
Date Sun, 15 Feb 2015 03:52:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321801#comment-14321801
] 

Hadoop QA commented on HDFS-7798:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12698944/HDFS-7798.01.patch
  against trunk revision 3338f6d.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:red}-1 eclipse:eclipse{color}.  The patch failed to build with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9587//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9587//console

This message is automatically generated.

> Checkpointing failure caused by shared KerberosAuthenticator
> ------------------------------------------------------------
>
>                 Key: HDFS-7798
>                 URL: https://issues.apache.org/jira/browse/HDFS-7798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: security
>            Reporter: Chengbing Liu
>            Assignee: Chengbing Liu
>            Priority: Critical
>         Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The standby NameNode
was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared {{KerberosAuthenticator}} in
{{URLConnectionFactory}}. The authenticator is designed as a use-once instance, and is not
stateless. It has attributes such as {{HttpURLConnection}} and {{URL}}. When multiple threads
are calling {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is going
to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we create a
new {{KerberosAuthenticator}} instance for each connection, to make checkpointing work. We
may consider making {{Authenticator}} design and implementation stateless afterwards, as {{ConnectionConfigurator}}
does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message