hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Surendra Singh Lilhore (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
Date Mon, 17 Sep 2018 06:49:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617143#comment-16617143
] 

Surendra Singh Lilhore commented on HDFS-13768:
-----------------------------------------------

Thanks [~linyiqun] for review...

bq. This comment seems not fully addressed . I mean we can also make AddReplicaProcessor in
an Asynchronous mode.
Sorry I missed it. 

bq. Could you please add the UT for this improvement?
yes, in next patch I will fix both comments.
 

 

>  Adding replicas to volume map makes DataNode start slowly 
> -----------------------------------------------------------
>
>                 Key: HDFS-13768
>                 URL: https://issues.apache.org/jira/browse/HDFS-13768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Yiqun Lin
>            Assignee: Surendra Singh Lilhore
>            Priority: Major
>         Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, HDFS-13768.patch
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we restart DNs,
the DNs start so slowly and not register to NN immediately. And this cause a lots of following
error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool, BP-1508644862-xx.xx.xx.xx-1493781183457.
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool operation before
the registration. And during initializing block pool operation, we found the adding replicas
to volume map is the most expensive operation.  Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume, but we
still need to wait the slowest thread to finish its work. So the main problem here is that
we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da runnable
[0x00007f4043a38000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.UnixFileSystem.list(Native Method)
> 	at java.io.File.list(File.java:1122)
> 	at java.io.File.listFiles(File.java:1207)
> 	at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task, rather than
a sync way. This will be a great improvement because it can greatly speed up recovery process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message