Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 54D5D200C16 for ; Thu, 26 Jan 2017 03:13:33 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5168B160B5F; Thu, 26 Jan 2017 02:13:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 72C0D160B5D for ; Thu, 26 Jan 2017 03:13:32 +0100 (CET) Received: (qmail 35350 invoked by uid 500); 26 Jan 2017 02:13:31 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 35341 invoked by uid 99); 26 Jan 2017 02:13:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jan 2017 02:13:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3A23B189AD4 for ; Thu, 26 Jan 2017 02:13:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.997 X-Spam-Level: X-Spam-Status: No, score=-1.997 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, NORMAL_HTTP_TO_IP=0.001, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id STZwJVuJYXpT for ; Thu, 26 Jan 2017 02:13:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EFE1C5FC75 for ; Thu, 26 Jan 2017 02:13:28 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AADA5E040D for ; Thu, 26 Jan 2017 02:13:26 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 64C9C2528B for ; Thu, 26 Jan 2017 02:13:26 +0000 (UTC) Date: Thu, 26 Jan 2017 02:13:26 +0000 (UTC) From: "Gilbert Song (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 Jan 2017 02:13:33 -0000 [ https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839044#comment-15839044 ] Gilbert Song commented on MESOS-6989: ------------------------------------- Hmm..I cannot reproduce with my local branch (~1 week behind `6c63a3fc7aba4d4cfc2f004362e4a6e3a384bd55`), but the segfault happened with the master branch. Should be something introduced recently. Will take a look later tonight. > Docker executor segfaults in ~MesosExecutorDriver() > --------------------------------------------------- > > Key: MESOS-6989 > URL: https://issues.apache.org/jira/browse/MESOS-6989 > Project: Mesos > Issue Type: Bug > Components: docker > Reporter: Jan-Philip Gehrcke > > With the current Mesos master state (commit 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults during shutdown. > Steps to reproduce: > 1) Start master: > {code} > $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp > I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0 > I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 42e515bc5c175a318e914d34473016feda4db6ff > {code} > (note that building it at 13:37 is not part of the repro) > 2) Start agent: > {code} > $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 --work_dir=/tmp/jp/mesos > {code} > 3) Run {{mesos-execute}} with the Docker containerizer: > {code} > $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand --containerizer=docker --docker_image=debian --command=env > I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0 > I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at master@127.0.0.1:5050 > Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-0000 > Submitted task 'testcommand' to agent '57596743-06f4-45f1-a975-348cf70589b1-S0' > Received status update TASK_RUNNING for task 'testcommand' > source: SOURCE_EXECUTOR > Received status update TASK_FINISHED for task 'testcommand' > message: 'Container exited with status 0' > source: SOURCE_EXECUTOR > {code} > Relevant agent output that shows the executor segfault: > {code} > [...] > I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for executor(1)@192.99.40.208:33529 > I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited > I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 > I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 > I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of framework 57596743-06f4-45f1-a975-348cf70589b1-0000 terminated with signal Segmentation fault (core dumped) > [...] > {code} > The complete task stderr: > {code} > $ cat /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/latest/stderr > I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0 > I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 57596743-06f4-45f1-a975-348cf70589b1-S0 > I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 --env-file /tmp/xFZ8G9 -v /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53 debian -c env > I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown > *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are using GNU date *** > PC: @ 0x7fb38f153dd0 (unknown) > *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; stack trace: *** > @ 0x7fb38f15b5c0 (unknown) > @ 0x7fb38f153dd0 (unknown) > @ 0x7fb39332c607 __gthread_mutex_lock() > @ 0x7fb39332c657 __gthread_recursive_mutex_lock() > @ 0x7fb39332edca std::recursive_mutex::lock() > @ 0x7fb393337bd8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_ > @ 0x7fb393337bf8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_ > @ 0x7fb39333ba6b Synchronized<>::Synchronized() > @ 0x7fb393337cac synchronize<>() > @ 0x7fb39492f15c process::ProcessManager::wait() > @ 0x7fb3949353f0 process::wait() > @ 0x55fd63f31fe5 process::wait() > @ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver() > @ 0x55fd63f2bd86 main > @ 0x7fb38e4fc401 __libc_start_main > @ 0x55fd63f2ab5a _start > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)