From hdfs-issues-return-265890-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon Jun 3 23:00:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 08C1618062F for ; Tue, 4 Jun 2019 01:00:01 +0200 (CEST) Received: (qmail 96804 invoked by uid 500); 3 Jun 2019 23:00:01 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 96782 invoked by uid 99); 3 Jun 2019 23:00:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2019 23:00:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9C0B0E002F for ; Mon, 3 Jun 2019 23:00:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 533072459B for ; Mon, 3 Jun 2019 23:00:00 +0000 (UTC) Date: Mon, 3 Jun 2019 23:00:00 +0000 (UTC) From: "Erik Krogen (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-14090?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1685= 5129#comment-16855129 ]=20 Erik Krogen commented on HDFS-14090: ------------------------------------ [~crh], I took a look at the design document and think your approach is ver= y sensible. One issue I considered was that if many clients start posting r= equests to subcluster A, the call queue on the router may become full of A = requests thus causing decreased service to subcluster B, but it should cont= inue to drain quickly as there will still be cluster B handlers available t= o read the requests and throw {{StandbyException}} to them. So it would see= m this should not be an issue. One thing I would prefer to see is an exception used besides {{StandbyExcep= tion}}; though practically it accomplishes the correct purpose, it is seman= tically incorrect. Really a backoff exception is closer to the correct sema= ntics. > RBF: Improved isolation for downstream name nodes. > -------------------------------------------------- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: CR Hota > Assignee: CR Hota > Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, RBF_ Isolation desi= gn.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, shou= ld help minimize impact of clients connecting to healthy clusters vs unheal= thy clusters. > For example - If there are 2 name nodes downstream, and one of them is he= avily loaded with calls spiking rpc queue times, due to back pressure the s= ame with start reflecting on the router. As a result of this, clients conne= cting to healthy/faster name nodes will also slow down as same rpc queue is= maintained for all calls at the router layer. Essentially the same IPC thr= ead pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss ho= w we can change the architecture and add some throttling logic for unhealth= y/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify do= wnstream name node and maintain a separate queue for each underlying name n= ode. Another simpler way is to maintain some sort of rate limiter configure= d for each name node and let routers drop/reject/send error requests after = certain threshold.=C2=A0 > This won=E2=80=99t be a simple=C2=A0change as router=E2=80=99s =E2=80=98S= erver=E2=80=99 layer would need redesign and implementation. Currently this= layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org