hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3998) Got an exception from ClientFinalizer when the JT is terminated
Date Wed, 04 Mar 2009 22:25:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678960#action_12678960
] 

Hairong Kuang commented on HADOOP-3998:
---------------------------------------

The new patch is not able to detect the case that NN first initiates a block recovery and
then client initiates a block recovery on the same block but uses a different primary datanode.

The jira seems to get into the details of detecting concurrent block recoveries. It is probably
my fault. But the detection seems to me need more thoughts on it. I also concern that such
a big patch may bring more bugs into the system.

To do a quick fix in 0.20 to resolve the infinite client retries, I propose to have a simple
fix as follow:
1. client does not retry if the primary datanode fails to recover the block;
2. client retries a different primary datanode if the chosen primary datanode is dead.

It seems to me that this fix does not degrade what's in the 0.20 branch. Setting of the last
recovery time in INodeFileUnderConstrustion makes any retry to fail. We could do the simple
fix on hadoop-5311 and continue concurrent block recoveries detection discussion here, or
vice versa, whichever Dhruba prefers.

> Got an exception from ClientFinalizer when the JT is terminated
> ---------------------------------------------------------------
>
>                 Key: HADOOP-3998
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3998
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Amar Kamat
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.2, 0.20.0
>
>         Attachments: closeAll.patch, closeAll.patch, closeAll.patch, closeAll2.patch
>
>
> This happens when we terminate the JT using _control-C_. It throws the following exception
> {noformat}
> Exception closing file my-file
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:193)
>         at org.apache.hadoop.hdfs.DFSClient.access$700(DFSClient.java:64)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2868)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:2837)
>         at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:808)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:205)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:253)
>         at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1367)
>         at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:234)
>         at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:219)
> {noformat}
> Note that _my-file_ is some file used by the JT.
> Also if there is some file renaming done, then the exception states that the earlier
file does not exist. I am not sure if this is a MR issue or a DFS issue. Opening this issue
for investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message