hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2188) RPC should send a ping rather than use client timeouts
Date Wed, 19 Mar 2008 18:32:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580492#action_12580492

Hairong Kuang commented on HADOOP-2188:

> As you mentioned there is more synchronization involved. It is harder check the correctness
with 'isClosed' etc. For e.g. it took some time to see what happens sendParam() returns silently
when isClosed is true.

Since this patch removes SocketTimeoutException, it exposes quite a lot incorrect synchronizations
in the code. Previously applications receive a SocketTimeoutException when a call is lost
but now applications get stuck for ever. It took me quite a lot of energy to debug and sort
out the synchronization part. Thank you for taking time to check its correctness.

> what should server do if some clients just don't read from the sockets? I think purging
exists only to handle exceptional cases like (unintentionally) rogue clients. One actual case
that happened is that one user accindentally started thousands of clients from one machine
and these clients could not read.

I think we should assume that clients uses IPC Client to talk to the IPC server, so no worry
about their not reading from the sockets. In the case of 1000 clients per machine, if they
all can send requests, why could not they read?

> RPC should send a ping rather than use client timeouts
> ------------------------------------------------------
>                 Key: HADOOP-2188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs, ipc
>            Reporter: Owen O'Malley
>            Assignee: Hairong Kuang
>         Attachments: ipc-timeout.patch, ipc-timeout1.patch, ipc-timeout2.patch, ipc-timeout3.patch,
> Current RPC (really IPC) relies on client side timeouts to find "dead" sockets. I propose
that we have a thread that once a minute (if the connection has been idle) writes a "ping"
message to the socket. The client can detect a dead socket by the resulting error on the write,
so no client side timeout is required. Also note that the ipc server does not need to respond
to the ping, just discard it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message