hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13768) Adding replicas to volume map makes DataNode start slowly
Date Fri, 05 Oct 2018 08:10:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639451#comment-16639451
] 

Yiqun Lin commented on HDFS-13768:
----------------------------------

Thanks [~surendrasingh] for attaching the patch for branch-2. When I reviewed this patch,
I found some differences between these two patches. Also caught some minor changes I was missing
for trunk patch.

For the branch-2 patch, some minor comments:

*ReplicaMap#addAndGet*
 For the logic consistentency, I prefer to use following way to get old replica info instead
of {{m.get(replicaInfo);}}
{code:java}
ReplicaInfo oldReplicaInfo = m.get(new Block(replicaInfo.getBlockId()));
{code}

*BlockPoolSlice*
 Can we add the additional null check of {{addReplicaThreadPool}} before invoking initializeAddReplicaPool
? That can avoid synchronized lock acquiring behaviour if pool is already initialized.

*FsDatasetImplTestUtils#getStoredGenerationStamp*
We should also use {{FILE_COMPARATOR}} to sort the list files.

*TestFsVolumeList.java*
Following one line looks unaligned, can you format this line?
{code}
+    RamDiskReplicaTracker ramDiskReplicaMap = RamDiskReplicaTracker
+        .getInstance(conf, fsDataset);
+    FsVolumeImpl vol = (FsVolumeImpl) fsDataset.getFsVolumeReferences().get(0);
+   String bpid = cluster.getNamesystem().getBlockPoolId();
+    // It will create BlockPoolSlice.AddReplicaProcessor task's and lunch in
+    // ForkJoinPool recursively
{code}

I found the wrong patch name for branch-2, the right way is HDFS-13768-branch-2.xx.patch.
Attach the same patch again. I plan to file another JIRA to fix some places still needed to
improve for trunk.

>  Adding replicas to volume map makes DataNode start slowly 
> -----------------------------------------------------------
>
>                 Key: HDFS-13768
>                 URL: https://issues.apache.org/jira/browse/HDFS-13768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Yiqun Lin
>            Assignee: Surendra Singh Lilhore
>            Priority: Major
>             Fix For: 3.2.0, 3.1.2
>
>         Attachments: HDFS-13768.01-branch-2.patch, HDFS-13768.01.patch, HDFS-13768.02.patch,
HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.05.patch, HDFS-13768.06.patch, HDFS-13768.07.patch,
HDFS-13768.patch, screenshot-1.png
>
>
> We find DN starting so slowly when rolling upgrade our cluster. When we restart DNs,
the DNs start so slowly and not register to NN immediately. And this cause a lots of following
error:
> {noformat}
> DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 dst: /xx.xx.xx.xx:50010
> java.io.IOException: Not ready to serve the block pool, BP-1508644862-xx.xx.xx.xx-1493781183457.
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looking into the logic of DN startup, it will do the initial block pool operation before
the registration. And during initializing block pool operation, we found the adding replicas
to volume map is the most expensive operation.  Related log:
> {noformat}
> 2018-07-26 10:46:23,771 INFO [Thread-105] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/1/dfs/dn/current: 242722ms
> 2018-07-26 10:46:26,231 INFO [Thread-109] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/5/dfs/dn/current: 245182ms
> 2018-07-26 10:46:32,146 INFO [Thread-112] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/8/dfs/dn/current: 251097ms
> 2018-07-26 10:47:08,283 INFO [Thread-106] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume
/home/hard_disk/2/dfs/dn/current: 287235ms
> {noformat}
> Currently DN uses independent thread to scan and add replica for each volume, but we
still need to wait the slowest thread to finish its work. So the main problem here is that
we could make the thread to run faster.
> The jstack we get when DN blocking in the adding replica:
> {noformat}
> "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da runnable
[0x00007f4043a38000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.UnixFileSystem.list(Native Method)
> 	at java.io.File.list(File.java:1122)
> 	at java.io.File.listFiles(File.java:1207)
> 	at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)
> {noformat}
> One improvement maybe we can use ForkJoinPool to do this recursive task, rather than
a sync way. This will be a great improvement because it can greatly speed up recovery process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message