hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1490) TransferFSImage should timeout
Date Thu, 11 Nov 2010 22:32:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931241#action_12931241

Dmytro Molkov commented on HDFS-1490:

TransferFsImage.getFileClient is the single point of entry for the code that does transfer
of the image and edits and it would be great to set timeouts in there.
However it is also used for telling the namenode it is time to pick up the image from checkpoint.
And this call will sit without response until the namenode fetches the image, which can be
a while.

We could either set the timeout to be rather large, giving the namenode enough time to fetch
the image or we could set different timeouts for these cases.
In the second scenario we can theoretically rely on the fact that File[] localPaths passed
into getFileClient is null and act accordingly.

Any thoughts?

> TransferFSImage should timeout
> ------------------------------
>                 Key: HDFS-1490
>                 URL: https://issues.apache.org/jira/browse/HDFS-1490
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
> Sometimes when primary crashes during image transfer secondary namenode would hang trying
to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that happens
there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since the Secondary
as the process is running it would just stay hung until we get an alarm saying the checkpointing
doesn't happen.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message