hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1490) TransferFSImage should timeout
Date Tue, 04 Sep 2012 13:29:08 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinay updated HDFS-1490:
------------------------

    Attachment: HDFS-1490.patch

Fixed typo.
Added @VisibleForTesting, since 'timeout' is used in test.

We have tested this in cluster, when the active nn's n/w broken. GetImage call got timeout.
                
> TransferFSImage should timeout
> ------------------------------
>
>                 Key: HDFS-1490
>                 URL: https://issues.apache.org/jira/browse/HDFS-1490
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
>         Attachments: HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch
>
>
> Sometimes when primary crashes during image transfer secondary namenode would hang trying
to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that happens
there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since the Secondary
as the process is running it would just stay hung until we get an alarm saying the checkpointing
doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message