From hdfs-issues-return-234439-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org  Tue Sep 18 09:54:05 2018
Return-Path: <hdfs-issues-return-234439-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id B8E9B180672
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 18 Sep 2018 09:54:04 +0200 (CEST)
Received: (qmail 24157 invoked by uid 500); 18 Sep 2018 07:54:03 -0000
Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:hdfs-issues-help@hadoop.apache.org>
List-Unsubscribe: <mailto:hdfs-issues-unsubscribe@hadoop.apache.org>
List-Post: <mailto:hdfs-issues@hadoop.apache.org>
List-Id: <hdfs-issues.hadoop.apache.org>
Delivered-To: mailing list hdfs-issues@hadoop.apache.org
Received: (qmail 24146 invoked by uid 99); 18 Sep 2018 07:54:03 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2018 07:54:03 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A14AB1833FC
	for <hdfs-issues@hadoop.apache.org>; Tue, 18 Sep 2018 07:54:02 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -110.301
X-Spam-Level:
X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3,
	SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100]
	autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id iYZBx3rxG4Y9 for <hdfs-issues@hadoop.apache.org>;
	Tue, 18 Sep 2018 07:54:01 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3C09D5F405
	for <hdfs-issues@hadoop.apache.org>; Tue, 18 Sep 2018 07:54:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C39B3E111B
	for <hdfs-issues@hadoop.apache.org>; Tue, 18 Sep 2018 07:54:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4F6B323F9E
	for <hdfs-issues@hadoop.apache.org>; Tue, 18 Sep 2018 07:54:00 +0000 (UTC)
Date: Tue, 18 Sep 2018 07:54:00 +0000 (UTC)
From: "Hadoop QA (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13174733.1532583504000.98085.1537257240323@Atlassian.JIRA>
In-Reply-To: <JIRA.13174733.1532583504000@Atlassian.JIRA>
References: <JIRA.13174733.1532583504000@Atlassian.JIRA> <JIRA.13174733.1532583504564@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-13768)  Adding replicas to volume map
 makes DataNode start slowly
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618628#comment-16618628 ] 

Hadoop QA commented on HDFS-13768:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} | {color:red} HDFS-13768 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13768 |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25090/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.


>  Adding replicas to volume map makes DataNode start slowly 
> -----------------------------------------------------------
>
>                 Key: HDFS-13768
>                 URL: https://issues.apache.org/jira/browse/HDFS-13768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Yiqun Lin
>            Assignee: Surendra Singh Lilhore
>            Priority: Major
>         Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, HDFS-13768.patch, screenshot-1.png
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we restart DNs, the DNs start so slowly and not register to NN immediately. And this cause a lots of following error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool, BP-1508644862-xx.xx.xx.xx-1493781183457.
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool operation before the registration. And during initializing block pool operation, we found the adding replicas to volume map is the most expensive operation.  Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume, but we still need to wait the slowest thread to finish its work. So the main problem here is that we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da runnable [0x00007f4043a38000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.UnixFileSystem.list(Native Method)
> 	at java.io.File.list(File.java:1122)
> 	at java.io.File.listFiles(File.java:1207)
> 	at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task, rather than a sync way. This will be a great improvement because it can greatly speed up recovery process.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org