hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-2864) Improve the Scalability and Robustness of IPC
Date Thu, 17 Jul 2014 21:02:04 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer resolved HADOOP-2864.

    Resolution: Fixed

This has changed so much since this JIRA was filed that I'm just going to close this as stale.

> Improve the Scalability and Robustness of IPC
> ---------------------------------------------
>                 Key: HADOOP-2864
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2864
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.16.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: RPCScalabilityDesignWeb.pdf
> This jira is intended to enhance IPC's scalability and robustness. 
> Currently an IPC server can easily hung due to a disk failure or garbage collection,
during which it cannot respond to the clients promptly. This has caused a lot of dropped calls
and delayed responses thus many running applications fail on timeout. On the other side if
busy clients send a lot of requests to the server in a short period of time or too many clients
communicate with the server simultaneously, the server may be swarmed by requests and cannot
work responsively. 
> The proposed changes aim to 
> # provide a better client/server coordination
> #* Server should be able to throttle client during burst of requests.
> #* A slow client should not affect server from serving other clients.
> #* A temporary hanging server should not cause catastrophic failures to clients.
> # Client/server should detect remote side failures. Examples of failures include: (1)
the remote host is crashed; (2) the remote host is crashed and then rebooted; (3) the remote
process is crashed or shut down by an operator;
> # Fairness. Each client should be able to make progress.

This message was sent by Atlassian JIRA

View raw message