hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13119) RBF: Manage unavailable clusters
Date Thu, 08 Feb 2018 08:06:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356614#comment-16356614
] 

Yiqun Lin edited comment on HDFS-13119 at 2/8/18 8:05 AM:
----------------------------------------------------------

Just looked into this,
{quote}When a federated cluster has one of the subcluster down, operations that run in every
subcluster (RouterRpcClient#invokeAll()) may take all the RPC connections.
{quote}
Looked into the related code, I didn't see the logic for triggering RPC requests for every
subclustet once one subcluster was down. I just looked the method {{RouterRpcClient#invoke}}
invoked in {{RouterRpcClient#invokeMethod}}. Correct me If I am wrong.

{quote}
Better control of the number of RPC clients
{quote}
Not so clear for this, do you mean we may have a maximum RPC queue size in Router RPC server
side?

I have a proposal for "No need to try so many times if we "know" the subcluster is down":
When the failed happened, then query from {{ActiveNamenodeResolver}} if the cluster is down,
if yes, don't do retry. In addition, current default retry times (10 times) can be decreased
a lot.


was (Author: linyiqun):
Just looked into this,
{quote}When a federated cluster has one of the subcluster down, operations that run in every
subcluster (RouterRpcClient#invokeAll()) may take all the RPC connections.
{quote}
Looked into the related code, I didn't see the logic for triggering RPC requests for every
subclustet once one subcluster was down. I just looked the method {{RouterRpcClient#invoke}}
invoked in {{RouterRpcClient#invokeMethod}}. Correct me If I am wrong.

Not so clear for this, would you describe more?
{quote}
Better control of the number of RPC clients
{quote}

I have a proposal for "No need to try so many times if we "know" the subcluster is down":
When the failed happened, then query from {{ActiveNamenodeResolver}} if the cluster is down,
if yes, don't do retry. In addition, current default retry times (10 times) can be decreased
a lot.

> RBF: Manage unavailable clusters
> --------------------------------
>
>                 Key: HDFS-13119
>                 URL: https://issues.apache.org/jira/browse/HDFS-13119
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Yiqun Lin
>            Priority: Major
>
> When a federated cluster has one of the subcluster down, operations that run in every
subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message