hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7857) -fstack-check compilation flag causes binary incompatibility for container-executor between RHEL 6 and RHEL 7
Date Wed, 31 Jan 2018 15:13:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346985#comment-16346985

Miklos Szegedi commented on YARN-7857:

Thank you, [~Jim_Brennan] for raising this. Indeed, you are right that it is not a simple
stack overflow that causes YARN-7796. However, I looked into it and it might be the right

The article you mentioned above is a bit coarse, and it does not tell much about the details.
I reproduced your issue with the exact RHEL versions you mentioned. At the time of the crash
we have the following values:
RDX=(SIZE + 2*15)/16*16=0x20010
RAX=RSP-(4K-8)-n*4K=0x7ffffffdd328=RSP-0x20FF8 << crashing writing 0 here
RCX=RSP-((SIZE + 2*15)/16*16+3K-8)=0x7ffffffdb318=RSP-0x23008
BUFFER=(RSP - SIZE + 15)/16*16=0x7FFFFFFDE310
The stack check code writes a 0 to every page from RSP-(4K-8) down until RCX using RAX as
the iterator, which is RSP-0x23008 at the time of the crash. The eventual location of the
buffer is a bit above of the crash but not too much.
 However, RSP is just 2 pages above the bottom of the stack and we try to check just a few
pages below the eventual buffer location, so the write should succeed. In fact, when I try
to reproduce the same issue (rh68 built binary on rh74) with a 110K buffer instead of 128K,
it works.
 As a conclusion, the stack check code seems to be legitimate. However, the code might address
the same memory later ending up with the same crash without stack checking. The RHEL 7.4 code
does an or of each location with itself and 0. Since the stack check code is similar to what
Meltdown does, I am wondering, if we ran into some kernel protection. Moving the buffer to
the heap removes all risk running into this protection.

> -fstack-check compilation flag causes binary incompatibility for container-executor between
RHEL 6 and RHEL 7
> -------------------------------------------------------------------------------------------------------------
>                 Key: YARN-7857
>                 URL: https://issues.apache.org/jira/browse/YARN-7857
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Jim Brennan
>            Assignee: Jim Brennan
>            Priority: Major
> The segmentation fault in container-executor reported in [YARN-7796]  appears to be due
to a binary compatibility issue with the {{-fstack-check}} flag that was added in [YARN-6721]
> Based on my testing, a container-executor (without the patch from [YARN-7796]) compiled
on RHEL 6 with the -fstack-check flag always hits this segmentation fault when run on RHEL
7.  But if you compile without this flag, the container-executor runs on RHEL 7 with no problems.
 I also verified this with a simple program that just does the copy_file.
> I think we need to either remove this flag, or find a suitable alternative.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message