hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (Jira) <j...@apache.org>
Subject [jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
Date Tue, 27 Aug 2019 19:29:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917061#comment-16917061

Íñigo Goiri commented on HDFS-14090:

Minor comments:
* Complete the javadocs for FairnessManager (e.g., {{grantPermission()}}).
* FairnessPolicyController#66 should fit in one line (I kind of see the readability advantage
* I think we can make PermitAllocationException  more specific (right now it just takes whatever
string). I think it would be nice to have the messages in PermitAllocationException itself
and we would just pass the number of handlers, the min and the nsId as a parameter. This is
already nice in PermitLimitExceededException.
* StaticFairnessPolicyController#184 can fit in one line. Actually, this might be better to
have as a different exception (which can be a subclass of PermitAllocationException).
* Too many lines in TestRouterFairnessManager#162.

> RBF: Improved isolation for downstream name nodes. {Static}
> -----------------------------------------------------------
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, HDFS-14090-HDFS-13891.002.patch,
HDFS-14090-HDFS-13891.003.patch, HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch,
HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, HDFS-14090.009.patch, HDFS-14090.010.patch,
RBF_ Isolation design.pdf
> Router is a gateway to underlying name nodes. Gateway architectures, should help minimize
impact of clients connecting to healthy clusters vs unhealthy clusters.
> For example - If there are 2 name nodes downstream, and one of them is heavily loaded
with calls spiking rpc queue times, due to back pressure the same with start reflecting on
the router. As a result of this, clients connecting to healthy/faster name nodes will also
slow down as same rpc queue is maintained for all calls at the router layer. Essentially the
same IPC thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we can change
the architecture and add some throttling logic for unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify downstream name
node and maintain a separate queue for each underlying name node. Another simpler way is to
maintain some sort of rate limiter configured for each name node and let routers drop/reject/send
error requests after certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign
and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message