hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2647) dfs -put hangs
Date Fri, 01 Feb 2008 15:49:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564803#action_12564803

Raghu Angadi commented on HADOOP-2647:

> My vote would be to do nothing on 0.16.

We can close this jira. Error message etc, could be changed later as part of some other jira.
I think there are no plans to fix this for 0.15.

> dfs -put hangs
> --------------
>                 Key: HADOOP-2647
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2647
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.1
>         Environment: LINUX
>            Reporter: lohit vijayarenu
>            Assignee: Raghu Angadi
>             Fix For: 0.16.1
>         Attachments: HADOOP-2647.patch
> We saw a case where dfs -put hung while copying a 2GB file for over 20 hours.
> When we took a look at the stack trace of process the main thread was waiting for confirmation
from namenode for complete status.
> only 4 blocks were copied and 5th block was missing when we ran fsck on the partially
transfered file. 
> There are 2 problems we saw here.
> 1. DFS client hung without a timeout when there is no response from namenode.
> 2. In IOUtils::copyBytes(InputStream in, OutputStream out, int buffSize, boolean close)
> During copy, if there is an exception, the out.close() is called. Exception is not caught.
Which is why we see a close call in the stack trace. 
> When we checked for block IDs in namenode log. For the block which was missing, there
was only one response to namenode instead of three.
> This close state coupled with namenode not reporting the error back might have cause
the whole process to hang.
> Opening this JIRA to see if we could add checks to the two problems mentioned above.
> <stack trace of main thread>
> "main" prio=10 tid=0x0805a000 nid=0x5b53 waiting on condition [0xf7e64000..0xf7e65288]
  java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method) 
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1751)  - locked
<0x77d593a0> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)  at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
>   at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140)
>   at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:826)
>   at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:114)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:1354)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1472)
> </stack trace of main thread>

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message