hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7796) Container-executor fails with segfault on certain OS configurations
Date Sat, 27 Jan 2018 04:34:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341948#comment-16341948
] 

Miklos Szegedi commented on YARN-7796:
--------------------------------------

Now the question is, how does a 128K allocation fill in a stack that is normally 8K? If it
is the one that brought up the issue, there should be another big allocation. Do you have
a ulimit -s value from a system that reproduces this?

> Container-executor fails with segfault on certain OS configurations
> -------------------------------------------------------------------
>
>                 Key: YARN-7796
>                 URL: https://issues.apache.org/jira/browse/YARN-7796
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>             Fix For: 3.1.0, 3.0.1
>
>         Attachments: YARN-7796.000.patch, YARN-7796.001.patch, YARN-7796.002.patch
>
>
> There is a relatively big (128K) buffer allocated on the stack in container-executor.c
for the purpose of copying files. As indicated by the below gdb stack trace, this allocation can
fail with SIGSEGV. This happens only on certain OS configurations - I can reproduce this
issue on RHEL 6.9:
> {code:java}
> [Thread debugging using libthread_db enabled]
> main : command provided 0
> main : run as user is ***
> main : requested yarn user is ***
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
out_filename=0x932930 "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens",
perm=384)
>     at /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> 966	  char buffer[buffer_size];
> (gdb) bt
> #0  0x00000000004069bc in copy_file (input=7, in_filename=0x7ffd669fd2d6 "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
out_filename=0x932930 "/yarn/nm/usercache/systest/appcache/application_1516711246952_0001/container_1516711246952_0001_02_000001.tokens",
perm=384)
>     at /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:966
> #1  0x0000000000409a81 in initialize_app (user=<value optimized out>, app_id=0x7ffd669fd2b7
"application_1516711246952_0001", nmPrivate_credentials_file=0x7ffd669fd2d6 "/yarn/nm/nmPrivate/container_1516711246952_0001_02_000001.tokens",
local_dirs=0x9331c8, log_roots=<value optimized out>, args=0x7ffd669fb168)
>     at /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1122
> #2  0x0000000000403f90 in main (argc=<value optimized out>, argv=<value optimized
out>) at /root/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:558
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message