hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2757) Should DFS outputstream's close wait forever?
Date Wed, 29 Apr 2009 06:51:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704010#action_12704010
] 

dhruba borthakur commented on HADOOP-2757:
------------------------------------------

You are referring to dfs.datanode.socket.write.timeout. These are configurable parameters
and I already set them to an appropriate number, e.g. 20 seconds because I want real-timeish
behaviour.

If all the datanode(s) in the pipeline die, then the client detects an error and aborts. That
is intended behaviour. If one datanode is not really dead (but hangs), then the client will
hang too. This patch does not fix that problem.

The main motivation for this patch is to detect namenode failures early. If a client is writing
to a block, it might take a while for the block to get filled up.... this time is dependent
at the rate at which the client is writing data... if the client is trickling data into the
block, it will not experience the dfs.datanode.socket.write.timeout timeout for a while. In
the existing code in trunk, the lease recovery thread will detect NN problem after a while
but it does nothing to terminate the threads that were writing to the block. The patch does
this.

> Should DFS outputstream's close wait forever?
> ---------------------------------------------
>
>                 Key: HADOOP-2757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2757
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>         Attachments: softMount1.patch, softMount1.patch, softMount2.patch
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps throwing {{NotYetReplicated}}
exception, for whatever reason. Its pretty annoying for a user. Shoud the loop inside close
have a timeout? If so how much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message