Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 27 Aug 2015 07:30:46 +0000 (UTC)
From: "Walter Su (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12844560.1436768382000.179734.1440660646231@Atlassian.JIRA>
In-Reply-To: <JIRA.12844560.1436768382000@Atlassian.JIRA>
References: <JIRA.12844560.1436768382000@Atlassian.JIRA>
 <JIRA.12844560.1436768382433@arcas>
Subject: [jira] [Updated] (HDFS-8763) After file closed, a race condition
 between IBR of 3rd replica of lastBlock and ReplicationMonitor
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HDFS-8763?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:all-tabpanel ]

Walter Su updated HDFS-8763:
----------------------------
    Description:=20
-For our cluster, the NameNode is always very busy, so for every incrementa=
l block report , the contention of lock is heavy.-
-The logic of incremental block report is as follow, client send block to d=
n1 and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all d=
atanode will report the newly received block to namenode. In NameNode side,=
 all will go to the method processIncrementalBlockReport in BlockManager cl=
ass. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, =
for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is =
common), but in some busy environment, it is easy to find dn1 report before=
 dn2 or dn3, let=E2=80=99s assume dn2 report first, dn1 report second, and =
dn3 report third.-
-So dn1 will addStoredBlock and find the replica of this block is not reach=
 the the original number(which is 3), and the block will add to neededRepli=
cations construction and soon ask some node in pipeline (dn1 or dn2)to repl=
ica it dn4 . After sometime, dn4 and dn3 all report this block, then choose=
 one node to invalidate.-
Here is one log i found in our cluster:
{noformat}
2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* all=
ocateBlock: /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx=
.xx.4.62_45454.tmp. BP-1386326728-xx.xx.2.131-1382089338395 blk_3194502674_=
2121080184{blockUCState=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replic=
as=3D[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b=
3:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d4=
8ccaa647c5:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449=
f-a941-6dced45e6f07:NORMAL|RBW]]}
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.7.75:50010 is added to blk_3194502674_2121080184{blockUC=
State=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnder=
Construction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], Rep=
licaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|=
RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a535799151=
2:NORMAL|RBW]]} size 0
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.4.62:50010 is added to blk_3194502674_2121080184{blockUC=
State=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnder=
Construction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], Rep=
licaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|=
RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a535799151=
2:NORMAL|RBW]]} size 0
2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 =
to replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 67=
750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 67=
750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add=
 blk_3194502674_2121080184 to xx.xx.7.75:50010
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicate=
s: (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated bl=
ocks set
2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Inv=
alidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, b=
lk_3194497594_2121075104]
{noformat}
Some day, the number of this situation can be 400000, that is not good for =
the performance and waste network band.
Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn=E2=80=
=99t find any difference.

  was:
For our cluster, the NameNode is always very busy, so for every incremental=
 block report , the contention of lock is heavy.
The logic of incremental block report is as follow, client send block to dn=
1 and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all da=
tanode will report the newly received block to namenode. In NameNode side, =
all will go to the method processIncrementalBlockReport in BlockManager cla=
ss. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, f=
or dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is c=
ommon), but in some busy environment, it is easy to find dn1 report before =
dn2 or dn3, let=E2=80=99s assume dn2 report first, dn1 report second, and d=
n3 report third.
So dn1 will addStoredBlock and find the replica of this block is not reach =
the the original number(which is 3), and the block will add to neededReplic=
ations construction and soon ask some node in pipeline (dn1 or dn2)to repli=
ca it dn4 . After sometime, dn4 and dn3 all report this block, then choose =
one node to invalidate.
Here is one log i found in our cluster:
2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* all=
ocateBlock: /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx=
.xx.4.62_45454.tmp. BP-1386326728-xx.xx.2.131-1382089338395 blk_3194502674_=
2121080184{blockUCState=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replic=
as=3D[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b=
3:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d4=
8ccaa647c5:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449=
f-a941-6dced45e6f07:NORMAL|RBW]]}
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.7.75:50010 is added to blk_3194502674_2121080184{blockUC=
State=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnder=
Construction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], Rep=
licaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|=
RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a535799151=
2:NORMAL|RBW]]} size 0
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.4.62:50010 is added to blk_3194502674_2121080184{blockUC=
State=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnder=
Construction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], Rep=
licaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|=
RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a535799151=
2:NORMAL|RBW]]} size 0
2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 =
to replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 67=
750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: block=
Map updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 67=
750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add=
 blk_3194502674_2121080184 to xx.xx.7.75:50010
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicate=
s: (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated bl=
ocks set
2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Inv=
alidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, b=
lk_3194497594_2121075104]

Some day, the number of this situation can be 400000, that is not good for =
the performance and waste network band.
Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn=E2=80=
=99t find any difference.


> After file closed, a race condition between IBR of 3rd replica of lastBlo=
ck and ReplicationMonitor
> -------------------------------------------------------------------------=
-------------------------
>
>                 Key: HDFS-8763
>                 URL: https://issues.apache.org/jira/browse/HDFS-8763
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: HDFS
>    Affects Versions: 2.4.0
>            Reporter: jiangyu
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-8763.01.patch
>
>
> -For our cluster, the NameNode is always very busy, so for every incremen=
tal block report , the contention of lock is heavy.-
> -The logic of incremental block report is as follow, client send block to=
 dn1 and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all=
 datanode will report the newly received block to namenode. In NameNode sid=
e, all will go to the method processIncrementalBlockReport in BlockManager =
class. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK=
, for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that i=
s common), but in some busy environment, it is easy to find dn1 report befo=
re dn2 or dn3, let=E2=80=99s assume dn2 report first, dn1 report second, an=
d dn3 report third.-
> -So dn1 will addStoredBlock and find the replica of this block is not rea=
ch the the original number(which is 3), and the block will add to neededRep=
lications construction and soon ask some node in pipeline (dn1 or dn2)to re=
plica it dn4 . After sometime, dn4 and dn3 all report this block, then choo=
se one node to invalidate.-
> Here is one log i found in our cluster:
> {noformat}
> 2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* a=
llocateBlock: /logs/***_bigdata_spam/logs/application_1435099124107_470749/=
xx.xx.4.62_45454.tmp. BP-1386326728-xx.xx.2.131-1382089338395 blk_319450267=
4_2121080184{blockUCState=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, repl=
icas=3D[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b=
7b3:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-=
d48ccaa647c5:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-4=
49f-a941-6dced45e6f07:NORMAL|RBW]]}
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blo=
ckMap updated: xx.xx.7.75:50010 is added to blk_3194502674_2121080184{block=
UCState=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnd=
erConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], R=
eplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMA=
L|RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991=
512:NORMAL|RBW]]} size 0
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blo=
ckMap updated: xx.xx.4.62:50010 is added to blk_3194502674_2121080184{block=
UCState=3DUNDER_CONSTRUCTION, primaryNodeIndex=3D-1, replicas=3D[ReplicaUnd=
erConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW], R=
eplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMA=
L|RBW], ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991=
512:NORMAL|RBW]]} size 0
> 2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:5001=
0 to replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
> 2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: blo=
ckMap updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size =
67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: blo=
ckMap updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size =
67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: a=
dd blk_3194502674_2121080184 to xx.xx.7.75:50010
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplica=
tes: (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated =
blocks set
> 2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* I=
nvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184,=
 blk_3194497594_2121075104]
> {noformat}
> Some day, the number of this situation can be 400000, that is not good fo=
r the performance and waste network band.
> Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn=E2=80=
=99t find any difference.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)