hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-255) Client Calls are not cancelled after a call timeout
Date Thu, 05 Oct 2006 22:45:21 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-255?page=comments#action_12440279 ] 
            
Owen O'Malley commented on HADOOP-255:
--------------------------------------

I'm going to hijack this bug. Clearly the original context was fixed by moving from the rpc
getMapOutput to a jetty servlet. However, we are seeing cases where the dfs servers have trouble
keeping up with the rpc calls. 

Therefore, I propose that we define a fraction of the ipc.timeout that is the maximum time
the rpc calls can take before they are given to the handler.

> Client Calls are not cancelled after a call timeout
> ---------------------------------------------------
>
>                 Key: HADOOP-255
>                 URL: http://issues.apache.org/jira/browse/HADOOP-255
>             Project: Hadoop
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.2.1
>         Environment: Tested on Linux 2.6
>            Reporter: Naveen Nalam
>         Assigned To: Owen O'Malley
>
> In ipc/Client.java, if a call times out, a SocketTimeoutException is thrown but the Call
object still exists on the queue.
> What I found was that when transferring very large amounts of data, it's common for queued
up calls to timeout. Yet even though the caller has is no longer waiting, the request is still
serviced on the server and the data is sent to the client. The client after receiving the
full response calls callComplete() which is a noop since nobody is waiting.
> The problem is that the calls that timeout will retry and the system gets into a situation
where data is being transferred around, but it's all data for timed out requests and no progress
is ever made.
> My quick solution to this was to add a "boolean timedout" to the Call object which I
set to true whenever the queued caller times out. And then when the client starts to pull
over the response data (in Connection::run) to first check if the Call is timedout and immediately
close the connection.
> I think a good fix for this is to queue requests on the client, and do a single sendParam
only when there is no outstanding request. This will allow closing the connection when receiving
a response for a request we no longer have pending, reopen the connection, and resend the
next queued request. I can provide a patch for this, but I've seen a lot of recent activity
in this area so I'd like to get some feedback first.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message