hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2757) Should DFS outputstream's close wait forever?
Date Mon, 25 May 2009 06:30:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712650#action_12712650
] 

dhruba borthakur commented on HADOOP-2757:
------------------------------------------

> 1. rpc timeout: the patch seems to implement a read timeout not rpc timeout. 

As an administrator of a cluster, I find it easier to set a time limit for a rpc conection
to bail out if it is not receiving response data continuously. I could change it to a true
rpcTimeout, but RPCs like "dfsadmin -report" could truly take a long time because the amount
of data to be transferred might be huge depending on the size of the cluster. I am comfortable
configuring a cluster in such a way that if a rpc client is waiting for more data from the
rpc server for more than 30 seconds, then the client can safely assume that the server is
non-responsive. This works even for RPCs that have to transfer large amounts of data. Do you
agree? 

> 2. if we have rpc timeout, why we still need soft mount timeout in leaseChecker? 
I think we need these two things to be separate. Please see answer to 3a below. 

> 3 I think the check "if (now > last + softMountTimeout) " could easily be true in
normal cases if renewFrenquency is set to be the soft mount timeout. 
The code sets renewFrequency to be softMountTimeout/3. So, "if renewFrenquency is set to be
the soft mount timeout" cannot happen. But I will modify this portion of code to handle this
case better. 

> 3a. I feel that the meaning of soft mount timeout is not clear maybe 
The NFS manual says something like this : " The softmount timeout sets the time the NFS client
will wait for a request to complete". 
To make things clearer, this patch keeps two configuration values: 
 ipc.client.inactivity.timeout: is the period of inactivity time when a client is waiting
for a response". 
 dfs.softmount.timeout: the max time a DFSClient will wait for a request to successfully complete

The ipc.client.inactivity.timeout is set for a single rpc call. The dfs.softmount.timeout
applied to FileSystem operations like DFSClient.close(). 

> 4. In the file close case, would it be better just to limit the number of retires? 
In fact, I first deployed a version of code in our cluster that specified the max number of
retries to be 5. But then, when I was explaining this behaviour to an app-writer who is writing
an app on top of hdfs, it was difficult for me to explain what it really means. I found it
easier to explain that "this call will not take more than 30 seconds". Also, specifying a
"time" is future proof in a sense that a hdfs developer can change the frequency of close-retries
without affecting the semantics exposed to the user. If you feel strongly against this one,
I can change it, please do let me know. 

thanks for reviewing this one. 

> Should DFS outputstream's close wait forever?
> ---------------------------------------------
>
>                 Key: HADOOP-2757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2757
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>         Attachments: softMount1.patch, softMount1.patch, softMount2.patch, softMount3.patch
>
>
> Currently {{DFSOutputStream.close()}} waits for ever if Namenode keeps throwing {{NotYetReplicated}}
exception, for whatever reason. Its pretty annoying for a user. Shoud the loop inside close
have a timeout? If so how much? It could probably something like 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message