hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiandan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7060) Avoid taking locks when sending heartbeats from the DataNode
Date Fri, 20 Oct 2017 08:12:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212333#comment-16212333
] 

Jiandan Yang  commented on HDFS-7060:
-------------------------------------

[~xinwei] [~brahmareddy] [~jojochuang]  We encountered the same problem(branch-2.8.2), BPServiceActor#offerService
blocked because sendHeartBeat waited for FSDataset lock, and blockReceivedAndDeleted was delay
about 60s, and eventually  client can not close file and threw Exception "Unable to close
file because the last blockxxx does not have enough number of replicas”

I think HDFS-7060 can solve our problem very well. Does this patch have any problem? Why does
it merge into trunk.

> Avoid taking locks when sending heartbeats from the DataNode
> ------------------------------------------------------------
>
>                 Key: HDFS-7060
>                 URL: https://issues.apache.org/jira/browse/HDFS-7060
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Xinwei Qin 
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch
>
>
> We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN
is under heavy load of writes:
> {noformat}
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115)
>         - waiting to lock <0x0000000780304fb8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91)
>         - locked <0x0000000780612fd8> (a java.lang.Object)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827)
>         at java.lang.Thread.run(Thread.java:744)
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743)
>         - waiting to lock <0x0000000780304fb8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:744)
>    java.lang.Thread.State: RUNNABLE
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createNewFile(File.java:1006)
>         at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753)
>         - locked <0x0000000780304fb8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60)
>         at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:169)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message