hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8429) Death of watcherThread making other local read blocked
Date Wed, 20 May 2015 11:20:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552147#comment-14552147
] 

Hadoop QA commented on HDFS-8429:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear to include any
new or modified tests.  Please justify why no new tests are needed for this patch. Also please
list what manual steps were performed to verify this patch. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc warning messages.
|
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does not increase
the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  5s | The applied patch generated  2 new checkstyle
issues (total was 19, now 21). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that end in whitespace.
|
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with eclipse:eclipse.
|
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce any new Findbugs
(version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 53s | Tests passed in hadoop-common. |
| | |  60m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12734105/HDFS-8429-001.patch
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ce53c8e |
| checkstyle |  https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/diffcheckstylehadoop-common.txt
|
| hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/artifact/patchprocess/testrun_hadoop-common.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11061/console |


This message was automatically generated.

> Death of watcherThread making other local read blocked
> ------------------------------------------------------
>
>                 Key: HDFS-8429
>                 URL: https://issues.apache.org/jira/browse/HDFS-8429
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: zhouyingchao
>            Assignee: zhouyingchao
>         Attachments: HDFS-8429-001.patch
>
>
> In our cluster, an application is hung when doing a short circuit read of local hdfs
block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread
has exited with following log:
> {code}
> ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating
on unexpected exception
> java.lang.NullPointerException
>         at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> The line 463 is following code snippet:
> {code}
>          try {
>             for (int fd : fdSet.getAndClearReadableFds()) {
>               sendCallbackAndRemove("getAndClearReadableFds", entries, fdSet,
>                 fd);
>             }
> {code}
> getAndClearReadableFds is a native method which will malloc an int array. Since our memory
is very tight, it looks like the malloc failed and a NULL pointer is returned.
> The bad thing is that other threads then blocked in stack like this:
> {code}
> "DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting
for operation #1]" daemon prio=10 tid=0x00007f0c9c086d90 nid=0x8fc3 waiting on condition [0x00007f09b9856000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000007b0174808> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> IMO, we should exit the DN so that the users can know that something go  wrong  and fix
it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message