hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-14035) Reduce fair call queue backoff's impact on clients
Date Fri, 10 Feb 2017 18:40:42 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Daryn Sharp updated HADOOP-14035:
    Attachment: HADOOP-14035.patch

Wrapped rpc server exception + retriable into a CallQueueOverflowException exception.  It's
an IllegalStateException to conform to the BlockingQueue api.

CallQueueManager conforms to BlockingQueue interface.  Backoff logic pushed down from ipc
server into CQM.  CQM's put decides whether to call managed queue's put or add based on backoff.

Server simply calls CQM.put.  Catches overflow exceptions and unwraps the RpcServerException/RetriableException.
 Rethrows to leverage prior changes to ipc layer to selectively close connections.

FCQ put remains unchanged.  Add, which CQM calls if backoff is enabled,  will offer to all
queues, upon overflow it throws an overflow exception.  For the lowest priority calls, the
overflow retriable closes the connection.  Non-lowest priority calls, the overflow retriable
leaves the connection open.

> Reduce fair call queue backoff's impact on clients
> --------------------------------------------------
>                 Key: HADOOP-14035
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14035
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.7.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-14035.patch
> When fcq backoff is enabled and an abusive client overflows the call queue, its connection
is closed, as well as subsequent good client connections.   Disconnects are very disruptive,
esp. to multi-threaded clients with multiple outstanding requests, or clients w/o a retry
proxy (ex. datanodes).
> Until the abusive user is downgraded to a lower priority queue, disconnect/reconnect
mayhem occurs which significantly degrades performance.  Server metrics look good despite
horrible client latency.
> The fcq should utilize selective ipc disconnects to avoid pushback disconnecting good

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message