hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
Date Thu, 13 Nov 2014 00:05:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208955#comment-14208955

Ming Ma commented on HADOOP-10597:

Thanks, Chris.

The backoff retry policy is defined by interface {{ClientBackoffPolicy}}. There are two implementations
of the interface, {{NullClientBackoffPolicy}} and {{LinearClientBackoffPolicy}}.

The experiment results are based on {{NullClientBackoffPolicy}} which doesn't specify any
retry policy. Thus RPC server will return empty {{RetriableException}} and let client decides
the retry policy. We can start with this policy when we enable this feature in production.
That will provide us useful info and help us to improve the feature and make necessary modification
to {{ClientBackoffPolicy}} and its implementations in next iterations.

{{LinearClientBackoffPolicy}} specifies retry policy based on numbers of succeeded and denied
requests. The policy will then be returned to the client and the client is expected to honor
that. {{recentBackOffCount}} will decrease with each successful queued request. So in your
case, if a client is denied first and then terminates before it retries, as long as enough
requests from other clients are queued successfully, {{recentBackOffCount}} will become zero.

There shouldn't be a case where the element be queued correctly but the client gets a retry.
The warn message is there to catch bad implementation of {{ClientBackoffPolicy}}. We can remove
that as it doesn't seem to be necessary.

Yes, it is better to rename oldValue to something else.

I will provide an updated patch after rebase to address your comments.

> Evaluate if we can have RPC client back off when server is under heavy load
> ---------------------------------------------------------------------------
>                 Key: HADOOP-10597
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10597
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, MoreRPCClientBackoffEvaluation.pdf,
> Currently if an application hits NN too hard, RPC requests be in blocking state, assuming
OS connection doesn't run out. Alternatively RPC or NN can throw some well defined exception
back to the client based on certain policies when it is under heavy load; client will understand
such exception and do exponential back off, as another implementation of RetryInvocationHandler.

This message was sent by Atlassian JIRA

View raw message