hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
Date Fri, 29 May 2015 10:31:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564543#comment-14564543
] 

Hadoop QA commented on HDFS-8496:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 34s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear to include any
new or modified tests.  Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. |
| {color:green}+1{color} | javac |   7m 26s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc warning messages.
|
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does not increase
the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  1 new checkstyle
issues (total was 124, now 120). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that end in whitespace.
Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with eclipse:eclipse.
|
| {color:green}+1{color} | findbugs |   3m 13s | The patch does not introduce any new Findbugs
(version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 57s | Tests passed in hadoop-hdfs. |
| | | 208m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12736065/HDFS-8496-001.patch
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d725dd8 |
| checkstyle |  https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
|
| whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/whitespace.txt
|
| hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/testrun_hadoop-hdfs.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/console |


This message was automatically generated.

> Calling stopWriter() with FSDatasetImpl lock held may  block other threads
> --------------------------------------------------------------------------
>
>                 Key: HDFS-8496
>                 URL: https://issues.apache.org/jira/browse/HDFS-8496
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: zhouyingchao
>            Assignee: zhouyingchao
>         Attachments: HDFS-8496-001.patch
>
>
> On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and  heartbeat threads
are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found
the calling of stopWriter() with FSDatasetImpl lock blocked everything.
> Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl
lock:
> {code}
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152)
>         - waiting to lock <0x00000007701badc0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144)
>         - locked <0x0000000770465dc0> (a java.lang.Object)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> The thread which held the FSDatasetImpl lock is just sleeping to wait another thread
to exit in stopWriter(). The stack is:
> {code}
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Thread.join(Thread.java:1194)
>         - locked <0x00000007636953b8> (a org.apache.hadoop.util.Daemon)
>         at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026)
>         - locked <0x00000007701badc0> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> In this case, we deployed quite a lot other workloads on the DN, the local file system
and disk is quite busy. We guess this is why the stopWriter took quite a long time.
> Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held.
  In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl
lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw().
> I'll try to finish a patch for this today. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message