hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6889) Make RPC to have an option to timeout
Date Fri, 29 Jul 2011 17:22:09 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072928#comment-13072928
] 

Matt Foley commented on HADOOP-6889:
------------------------------------

Unfortunately this patch diverges a lot from the trunk patch (presumably because of 0.20/0.23
code tree divergence, of course), so I could not usefully diff the patches and had to review
this like a new patch.  

In terms of code review, I found no problems.  But it's a large enough patch that we are dependent
on thorough unit testing to be confident in the patch.  So I have two questions:

1. I see a single new test case, TestIPC.testIpcTimeout(), that tests the lowest-level timeout
functionality, between a client and a TestServer server.  However, I do not see any test cases
that check whether the integration of that timeout functionality with, eg, the InterDatanodeProtocol
works as expected. (The mod to TestInterDatanodeProtocol merely adapts to the change, it does
not test the change.)  Similarly, no test of timeout in the context of DFSClient with a MiniDFSCluster.
 Granted the original patch to trunk doesn't test these either.  But do you feel confident
in the patch without such additional tests, and why?

2. Are the variances between the trunk and v20 patches due only to code tree divergence, or
are there changes added to the v20 patch that are not in v23 and perhaps should be?  Thanks.


> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: John George
>             Fix For: 0.20-append, 0.20.205.0, 0.22.0
>
>         Attachments: HADOOP-6889-for20.patch, HADOOP-6889.patch, ipcTimeout.patch, ipcTimeout1.patch,
ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently
does is that a RPC client sends a ping to the server whenever a socket timeout happens. If
the server is still alive, it continues to wait instead of throwing a SocketTimeoutException.
This is to avoid a client to retry when a server is busy and thus making the server even busier.
This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example,
for getting a replica's length. When a client comes across a problematic DataNode, it gets
stuck and can not switch to a different DataNode. In this case, it would be better that the
client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number
of pings that a client could try. If a response can not be received after the specified max
number of pings, a SocketTimeoutException is thrown. If this configuration property is not
set, a client maintains the current semantics, waiting forever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message