hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12606) JVM crashes when running NNBench on EC enabled.
Date Fri, 06 Oct 2017 00:52:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193998#comment-16193998
] 

Kai Zheng commented on HDFS-12606:
----------------------------------

Thanks for the ping Eddy. By design we can have multiple coder instances for concurrent coding
tasks, and no global static variable should block this except bugs. We guard isal codes in
Java, not relying on its thread model. We can investigate it when back to office, next Monday.

> JVM crashes when running NNBench on EC enabled. 
> ------------------------------------------------
>
>                 Key: HDFS-12606
>                 URL: https://issues.apache.org/jira/browse/HDFS-12606
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-beta1
>            Reporter: Lei (Eddy) Xu
>            Priority: Critical
>
> When running NNbench on a RS(6,3) directory, JVM crashes double free or corruption:
> {code}
> 08:16:29 Running NNBENCH.
> 08:16:29 WARNING: Use "yarn jar" to launch YARN applications.
> 08:16:31 NameNode Benchmark 0.4
> 08:16:31 17/10/04 08:16:31 INFO hdfs.NNBench: Test Inputs: 
> 08:16:31 17/10/04 08:16:31 INFO hdfs.NNBench: Test Operation: create_write
> 08:16:31 17/10/04 08:16:31 INFO hdfs.NNBench: Start time: 2017-10-04 08:18:31,16
> :
> :
> 08:18:54 *** Error in `/usr/java/jdk1.8.0_144/bin/java': double free or corruption (out):
0x00007ffb55dbfab0 ***
> 08:18:54 ======= Backtrace: =========
> 08:18:54 /lib64/libc.so.6(+0x7c619)[0x7ffb5b85f619]
> 08:18:54 [0x7ffb45017774]
> 08:18:54 ======= Memory map: ========
> 08:18:54 00400000-00401000 r-xp 00000000 ca:01 276832134 /usr/java/jdk1.8.0_144/bin/java
> 08:18:54 00600000-00601000 rw-p 00000000 ca:01 276832134 /usr/java/jdk1.8.0_144/bin/java
> 08:18:54 0173e000-01f91000 rw-p 00000000 00:00 0 [heap]
> 08:18:54 603600000-614700000 rw-p 00000000 00:00 0 
> 08:18:54 614700000-72bd00000 ---p 00000000 00:00 0 
> 08:18:54 72bd00000-73a500000 rw-p 00000000 00:00 0 
> 08:18:54 73a500000-7c0000000 ---p 00000000 00:00 0 
> 08:18:54 7c0000000-7c0400000 rw-p 00000000 00:00 0 
> 08:18:54 7c0400000-800000000 ---p 00000000 00:00 0 
> 08:18:54 7ffb20174000-7ffb208ab000 rw-p 00000000 00:00 0 
> 08:18:54 7ffb208ab000-7ffb20975000 ---p 00000000 00:00 0 
> 08:18:54 7ffb20975000-7ffb20b75000 rw-p 00000000 00:00 0 
> 08:18:54 7ffb20b75000-7ffb20d75000 rw-p 00000000 00:00 0 
> 08:18:54 7ffb20d75000-7ffb20d8a000 r-xp 00000000 ca:01 209866 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 08:18:54 7ffb20d8a000-7ffb20f89000 ---p 00015000 ca:01 209866 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 08:18:54 7ffb20f89000-7ffb20f8a000 r--p 00014000 ca:01 209866 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 08:18:54 7ffb20f8a000-7ffb20f8b000 rw-p 00015000 ca:01 209866 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
> 08:18:54 7ffb20f8b000-7ffb20fbd000 r-xp 00000000 ca:01 553654092 /usr/java/jdk1.8.0_144/jre/lib/amd64/libsunec.so
> 08:18:54 7ffb20fbd000-7ffb211bc000 ---p 00032000 ca:01 553654092 /usr/java/jdk1.8.0_144/jre/lib/amd64/libsunec.so
> 08:18:54 7ffb211bc000-7ffb211c2000 rw-p 00031000 ca:01 553654092 /usr/java/jdk1.8.0_144/jre/lib/amd64/libsunec.so
> :
> :
> 08:18:54 7ffb5c3fb000-7ffb5c3fc000 r--p 00000000 00:00 0 
> 08:18:54 7ffb5c3fc000-7ffb5c3fd000 rw-p 00000000 00:00 0 
> 08:18:54 7ffb5c3fd000-7ffb5c3fe000 r--p 00021000 ca:01 637266 /usr/lib64/ld-2.17.so
> 08:18:54 7ffb5c3fe000-7ffb5c3ff000 rw-p 00022000 ca:01 637266 /usr/lib64/ld-2.17.so
> 08:18:54 7ffb5c3ff000-7ffb5c400000 rw-p 00000000 00:00 0 
> 08:18:54 7ffdf8767000-7ffdf8788000 rw-p 00000000 00:00 0 [stack]
> 08:18:54 7ffdf878b000-7ffdf878d000 r-xp 00000000 00:00 0 [vdso]
> 08:18:54 ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
> {code}
> It happens on both {{jdk1.8.0_144}} and {{jdk1.8.0_121}} in our environments. 
> It is highly suspicious due to the native code used in erasure coding, i.e., ISA-L is
not thread safe [https://01.org/sites/default/files/documentation/isa-l_open_src_2.10.pdf]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message