hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7739) ZKFC - transitionToActive is indefinitely waiting to complete fenceOldActive
Date Fri, 01 May 2015 03:14:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522710#comment-14522710
] 

Chris Nauroth commented on HDFS-7739:
-------------------------------------

Hi [~brahmareddy].  From the stack trace, it looks like the process is blocked waiting to
read output from the ssh connection to run fuser to stop the old active.  I can think of 2
possible theories:

# Passwordless ssh is not configured, so the connection is hanging indefinitely prompting
for a password.  This would require configuration of {{dfs.ha.fencing.ssh.private-key-files}}
to specify the ssh key file.
# The ssh connection to run fuser is hanging indefinitely.  This could be caused by a lot
of different kinds of failures at the old active, making it unresponsive.  This can be mitigated
by configuring a timeout on the ssh connection ({{dfs.ha.fencing.ssh.connect-timeout}}).

This documentation page has more details:

http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Configuration_details

> ZKFC - transitionToActive is indefinitely waiting to complete fenceOldActive
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-7739
>                 URL: https://issues.apache.org/jira/browse/HDFS-7739
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: auto-failover
>    Affects Versions: 2.6.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
>         Attachments: zkfctd.out
>
>
>  *Scenario:* 
> One of the cluster disk got full and ZKFC making tranisionToAcitve ,To fence old active
node it needs to execute the command and wait for tge result, since disk got full, strempumper
thread will be indefinitely waiting( Even after free the disk also, it will not come out)...
>  *{color:blue}Please check the attached thread dump of ZKFC{color}* ..
>  *{color:green}Better to maintain the timeout for stream-pumper thread{color}* .
> {code}
> protected void pump() throws IOException {
>     InputStreamReader inputStreamReader = new InputStreamReader(stream);
>     BufferedReader br = new BufferedReader(inputStreamReader);
>     String line = null;
>     while ((line = br.readLine()) != null) {
>       if (type == StreamType.STDOUT) {
>         log.info(logPrefix + ": " + line);
>       } else {
>         log.warn(logPrefix + ": " + line);          
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message