hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
Date Mon, 01 Jun 2015 06:55:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566993#comment-14566993
] 

Rohith commented on YARN-3585:
------------------------------

This is race condition when the NodeManager is shutting down and container is launched. By
the time container is launched and returned to ContainerImpl, NodeManager closed the DB connection
which resulting in {{org.iq80.leveldb.DBException: Closed
}}

> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3585
>                 URL: https://issues.apache.org/jira/browse/YARN-3585
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Rohith
>            Priority: Critical
>         Attachments: YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but process cannot
end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x00007f3460011800 nid=0x29ec waiting on condition [0x0000000000000000]
> "leveldb" prio=10 tid=0x00007f3354001800 nid=0x2a97 runnable [0x0000000000000000]
> "VM Thread" prio=10 tid=0x00007f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x00007f3460020000 nid=0x29ed runnable

> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x00007f3460022000 nid=0x29ee runnable

> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x00007f3460024000 nid=0x29ef runnable

> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x00007f3460025800 nid=0x29f0 runnable

> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x00007f3460027800 nid=0x29f1 runnable

> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x00007f3460029000 nid=0x29f2 runnable

> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x00007f346002b000 nid=0x29f3 runnable

> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x00007f346002d000 nid=0x29f4 runnable

> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x00007f3460120800 nid=0x29f7 runnable

> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x00007f346011c800 nid=0x29f5 runnable

> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x00007f346011e800 nid=0x29f6 runnable

> "VM Periodic Task Thread" prio=10 tid=0x00007f346019f800 nid=0x2a01 waiting on condition

> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x0000003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x00007f33dfce2a3b in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*)
() from /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x0000003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x0000003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message