From common-issues-return-180952-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org  Thu Aug  1 10:23:02 2019
Return-Path: <common-issues-return-180952-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 7C8E1180651
	for <archive-asf-public@cust-asf.ponee.io>; Thu,  1 Aug 2019 12:23:02 +0200 (CEST)
Received: (qmail 19956 invoked by uid 500); 1 Aug 2019 10:23:01 -0000
Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:common-issues-help@hadoop.apache.org>
List-Unsubscribe: <mailto:common-issues-unsubscribe@hadoop.apache.org>
List-Post: <mailto:common-issues@hadoop.apache.org>
List-Id: <common-issues.hadoop.apache.org>
Delivered-To: mailing list common-issues@hadoop.apache.org
Received: (qmail 19845 invoked by uid 99); 1 Aug 2019 10:23:01 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2019 10:23:01 +0000
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5DCA8E0E2E
	for <common-issues@hadoop.apache.org>; Thu,  1 Aug 2019 10:23:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 197A526636
	for <common-issues@hadoop.apache.org>; Thu,  1 Aug 2019 10:23:00 +0000 (UTC)
Date: Thu, 1 Aug 2019 10:23:00 +0000 (UTC)
From: "Jinglun (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.13242434.1561968068000.88392.1564654980101@Atlassian.JIRA>
In-Reply-To: <JIRA.13242434.1561968068000@Atlassian.JIRA>
References: <JIRA.13242434.1561968068000@Atlassian.JIRA> <JIRA.13242434.1561968068950@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HADOOP-16403) Start a new statistical rpc queue
 and make the Reader's pendingConnection queue runtime-replaceable
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/HADOOP-16403?page=3Dcom.atlassi=
an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D16=
897945#comment-16897945 ]=20

Jinglun commented on HADOOP-16403:
----------------------------------

About shadedclient error, I searched [patch-shadedclient.txt|https://builds=
.apache.org/job/PreCommit-HADOOP-Build/16437/artifact/out/patch-shadedclien=
t.txt]=C2=A0and found this:
{quote}[ERROR] Found artifact with unexpected contents: '/testptch/hadoop/h=
adoop-client-modules/hadoop-client-api/target/hadoop-client-api-3.3.0-SNAPS=
HOT.jar'
 Please check the following and either correct the build or update
 the allowed list with reasoning.

core-default.xml.orig
{quote}
There is a jar check in *_./hadoop-client-modules/hadoop-client-check-invar=
iants/src/test/resources/ensure-jars-have-correct-contents.sh_*, seems core=
-default.xml.orig is packaged into=C2=A0hadoop-client-api-3.3.0-SNAPSHOT.ja=
r.=C2=A0

I'm not sure how does this happen. I make a new patch from the latest trunk=
 and fix the check styles. Upload patch-005 see if the shadedclient error s=
till occurs.

=C2=A0

=C2=A0

> Start a new statistical rpc queue and make the Reader's pendingConnection=
 queue runtime-replaceable
> -------------------------------------------------------------------------=
--------------------------
>
>                 Key: HADOOP-16403
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16403
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HADOOP-16403-How_MetricLinkedBlockingQueue_Works.pdf=
, HADOOP-16403.001.patch, HADOOP-16403.002.patch, HADOOP-16403.003.patch, H=
ADOOP-16403.004.patch, MetricLinkedBlockingQueueTest.pdf
>
>
> I have an HA cluster with 2 NameNodes. The NameNode's meta is quite big s=
o after the active dead, it takes the standby more than 40s to become activ=
e. Many requests(tcp connect request and rpc request) from Datanodes, clien=
ts and zkfc timed out and start retrying. The suddenly request flood lasts =
for the next 2 minutes and finally all requests are either handled or run o=
ut of retry times.=20
>  Adjusting the rpc related settings might power the NameNode and solve th=
is problem and the key point is finding the bottle neck. The rpc server can=
 be described as below:
> {noformat}
> Listener -> Readers' queues -> Readers -> callQueue -> Handlers{noformat}
> By sampling some failed clients, I find many of them got ConnectTimeoutEx=
ception. It's caused by a 20s un-responded tcp connect request. I think may=
 be the reader queue is full and block the listener from handling new conne=
ctions. Both slow handlers and slow readers can block the whole processing =
progress, and I need to know who it is. I think *a queue that computes the =
qps, write log when the queue is full and could be replaced easily* will he=
lp.=20
>  I find the nice work HADOOP-10302 implementing a runtime-swapped queue. =
Using it at Reader's queue makes the reader queue runtime-swapped automatic=
ally. The qps computing job could be done by implementing a subclass of Lin=
kedBlockQueue that does the computing job while put/take/... happens. The q=
ps data will show on jmx.
> =C2=A0
> =C2=A0


--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org