hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Zhuge (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown
Date Tue, 19 Sep 2017 05:22:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171134#comment-16171134
] 

John Zhuge edited comment on HDFS-11851 at 9/19/17 5:21 AM:
------------------------------------------------------------

You are right because "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//*"
should point to hadoop-common jar:
{noformat}
# ls /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common*jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-2.6.0-cdh5.12.1.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-2.6.0-cdh5.12.1-tests.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-tests.jar
{noformat}

How did you hit the exception? Please set "ulimit -c unlimited" before reproducing the issue
in order to generate a core dump. Upload the core dump or run "gdb <exec> <core>"
and then "bt" to get the stack trace.


was (Author: jzhuge):
You are right because "/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//*"
should point to hadoop-common jar:
{noformat}
# ls /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common*jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-2.6.0-cdh5.12.1.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-2.6.0-cdh5.12.1-tests.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common.jar
/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p2550.2807/lib/hadoop/libexec/../../hadoop/.//hadoop-common-tests.jar
{noformat}

How did you hit the exception? Please set "ulimit -c unlimited" before reproducing the issue
with a core dump. Upload the core dump or run "gdb <exec> <core>" and then "bt"
to get the stack trace.

> getGlobalJNIEnv() may deadlock if exception is thrown
> -----------------------------------------------------
>
>                 Key: HDFS-11851
>                 URL: https://issues.apache.org/jira/browse/HDFS-11851
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Henry Robinson
>            Assignee: Sailesh Mukil
>            Priority: Blocker
>             Fix For: 3.0.0-alpha4
>
>         Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, HDFS-11851.002.patch,
HDFS-11851.003.patch, HDFS-11851.004.patch, HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception is thrown.
{{getGlobalJNIEnv()}} holds {{jvmMutex}}, but {{printExceptionAndFree()}} will eventually
try to acquire that lock in {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>                          "org/apache/hadoop/fs/FileSystem",
>                          "loadFileSystems", "()V");
>         if (jthr) {
>             printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "loadFileSystems");
>         }
>     }
> {code}
> and here's the relevant parts of the stack trace from where I call this API in Impala,
which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007ffff4a8d657 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x00007ffff4a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 <jvmMutex>)
at ../nptl/pthread_mutex_lock.c:79
> #3  0x0000000002f06056 in mutexLock (m=<optimized out>) at /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x0000000002efe817 in setTLSExceptionStrings (rootCause=0x0, stackTrace=0x0) at /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x0000000002f065d7 in printExceptionAndFreeV (env=0x513c1e8, exc=0x508a8c0, noPrintFlags=<optimized
out>, fmt=0x34349cf "loadFileSystems", ap=0x7fffffffb660)
>     at /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x0000000002f0683d in printExceptionAndFree (env=<optimized out>, exc=<optimized
out>, noPrintFlags=<optimized out>, fmt=<optimized out>)
>     at /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x0000000002eff60f in getGlobalJNIEnv () at /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message