Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 22 Oct 2015 01:23:27 +0000 (UTC)
From: "Uma Maheswara Rao G (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12902458.1444065925000.24486.1445477007986@Atlassian.JIRA>
In-Reply-To: <JIRA.12902458.1444065925000@Atlassian.JIRA>
References: <JIRA.12902458.1444065925000@Atlassian.JIRA>
 <JIRA.12902458.1444065925486@arcas>
Subject: [jira] [Comment Edited] (HDFS-9198) Coalesce IBR processing in the
 NN
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HDFS-9198?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14968=
317#comment-14968317 ]=20

Uma Maheswara Rao G edited comment on HDFS-9198 at 10/22/15 1:22 AM:
---------------------------------------------------------------------

Thank Daryn for the Nice work here. This is interesting to me.
I have just review the patch. Following are my comments:

# runBlockOp: how about naming it as runBlockReportOp ?
# nit: {code}
while (namesystem.isRunning()) {
+        NameNodeMetrics metrics =3D NameNode.getNameNodeMetrics();
{code}
May be we can take metrics outside loop and use it?
# I think we need to handle throwable for this BR processing thread? incase=
 of any unexpected errors, this thread should not die silently as its one o=
f the important processing thread=E2=80=A6 ? we may have to terminate the s=
ystem in such cases.
minor suggestion: method names in BM could be like runBlockReportOpSync and=
 runBlockReportAsync ?=20
# code format missed for this lines:
{code}
metrics.setBlockOpsQueued(queue.size()+1);
metrics.addBlockOpsBatched(processed-1);
{code}
# Currently DN sets the flag to trigger sendImmediateIBR on failure of IBR =
processing. But now we handle Exceptions as NN itself and can not pass to D=
N as due to async. So now we sendImmdeiateIBR happens only for IPC level ex=
ceptions. Have you thought about it. Missing such info would have to wait u=
ntil next BR right?
# Tests looking great to me. minor suggestion is could you please add javad=
oc for tests?


was (Author: umamaheswararao):
Thank Daryn for the Nice work here. This is interesting to me.
I have just review the patch. Following are my comments:

# runBlockOp: how about naming it as runBlockReportOp ?
# nit: {code}
while (namesystem.isRunning()) {
+        NameNodeMetrics metrics =3D NameNode.getNameNodeMetrics();
{code}
May be we can take metrics outside loop and use it?
# I think we need to handle throwable for this BR processing thread? incase=
 of any unexpected errors, this thread should not die silently as its one o=
f the important processing thread=E2=80=A6 ? we may have to terminate the s=
ystem in such cases.
minor suggestion: method names in BM could be like runBlockReportOpSync and=
 runBlockReportAsync ?=20
# code format missed for this lines:
{code}
metrics.setBlockOpsQueued(queue.size()+1);
metrics.addBlockOpsBatched(processed-1);
{code}
# Currently DN sets the flag to trigger sendImmediateIBR on failure of IBR =
processing. But now we handle Exceptions as NN itself and can not pass to D=
N as due to async. So now we sendImmdeiateIBR happens only for IPC level ex=
ceptions. Have you thought about it. Missing such info would have to wait u=
ntil next BR right?
# Tests looking great to me. minor suggestion is could you please ass javad=
oc for tests?


> Coalesce IBR processing in the NN
> ---------------------------------
>
>                 Key: HDFS-9198
>                 URL: https://issues.apache.org/jira/browse/HDFS-9198
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS=
-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to =
excessive write-lock contention from multiple IPC handler threads.  The IBR=
 processing is quick, so the lock contention may be reduced by coalescing m=
ultiple IBRs into a single write-lock transaction.  The handlers will also =
be freed up faster for other operations.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)