Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4D8AD200C17 for ; Fri, 27 Jan 2017 01:55:42 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4C1AA160B50; Fri, 27 Jan 2017 00:55:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0072C160B4C for ; Fri, 27 Jan 2017 01:55:40 +0100 (CET) Received: (qmail 66447 invoked by uid 500); 27 Jan 2017 00:55:40 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 66438 invoked by uid 99); 27 Jan 2017 00:55:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2017 00:55:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B3348C0939 for ; Fri, 27 Jan 2017 00:55:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.998 X-Spam-Level: X-Spam-Status: No, score=-1.998 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, NORMAL_HTTP_TO_IP=0.001, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id E9Cw4iQ2LaXS for ; Fri, 27 Jan 2017 00:55:31 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6B37D5FDC3 for ; Fri, 27 Jan 2017 00:55:31 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7ADFAE043B for ; Fri, 27 Jan 2017 00:55:25 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id BA8B425295 for ; Fri, 27 Jan 2017 00:55:24 +0000 (UTC) Date: Fri, 27 Jan 2017 00:55:24 +0000 (UTC) From: "Greg Mann (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MESOS-6985) os::getenv() can segfault MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 27 Jan 2017 00:55:42 -0000 [ https://issues.apache.org/jira/browse/MESOS-6985?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1584= 0771#comment-15840771 ]=20 Greg Mann commented on MESOS-6985: ---------------------------------- Yep, it's definitely occurring in {{::getenv}}. Here's the result of a fail= ed test run within {{gdb}}: {code} [ RUN ] MasterTest.MultipleExecutors I0127 00:39:33.120487 1809 cluster.cpp:160] Creating default 'local' autho= rizer I0127 00:39:33.122427 1815 master.cpp:383] Master ac440d30-722b-43a5-9f61-= cea98b3e576a (vagrant-ubuntu-trusty-64) started on 10.0.2.15:51845 I0127 00:39:33.122498 1815 master.cpp:385] Flags at startup: --acls=3D"" -= -agent_ping_timeout=3D"15secs" --agent_reregister_timeout=3D"10mins" --allo= cation_interval=3D"1secs" --allocator=3D"HierarchicalDRF" --authenticate_ag= ents=3D"true" --authenticate_frameworks=3D"true" --authenticate_http_framew= orks=3D"true" --authenticate_http_readonly=3D"true" --authenticate_http_rea= dwrite=3D"true" --authenticators=3D"crammd5" --authorizers=3D"local" --cred= entials=3D"/tmp/b7WHq9/credentials" --framework_sorter=3D"drf" --help=3D"fa= lse" --hostname_lookup=3D"true" --http_authenticators=3D"basic" --http_fram= ework_authenticators=3D"basic" --initialize_driver_logging=3D"true" --log_a= uto_initialize=3D"true" --logbufsecs=3D"0" --logging_level=3D"INFO" --max_a= gent_ping_timeouts=3D"5" --max_completed_frameworks=3D"50" --max_completed_= tasks_per_framework=3D"1000" --quiet=3D"false" --recovery_agent_removal_lim= it=3D"100%" --registry=3D"in_memory" --registry_fetch_timeout=3D"1mins" --r= egistry_gc_interval=3D"15mins" --registry_max_agent_age=3D"2weeks" --regist= ry_max_agent_count=3D"102400" --registry_store_timeout=3D"100secs" --regist= ry_strict=3D"false" --root_submissions=3D"true" --user_sorter=3D"drf" --ver= sion=3D"false" --webui_dir=3D"/usr/local/share/mesos/webui" --work_dir=3D"/= tmp/b7WHq9/master" --zk_session_timeout=3D"10secs" I0127 00:39:33.122836 1815 master.cpp:435] Master only allowing authentica= ted frameworks to register I0127 00:39:33.122858 1815 master.cpp:449] Master only allowing authentica= ted agents to register I0127 00:39:33.122875 1815 master.cpp:462] Master only allowing authentica= ted HTTP frameworks to register I0127 00:39:33.122891 1815 credentials.hpp:37] Loading credentials for aut= hentication from '/tmp/b7WHq9/credentials' I0127 00:39:33.123128 1815 master.cpp:507] Using default 'crammd5' authent= icator I0127 00:39:33.123265 1815 http.cpp:922] Using default 'basic' HTTP authen= ticator for realm 'mesos-master-readonly' I0127 00:39:33.123394 1815 http.cpp:922] Using default 'basic' HTTP authen= ticator for realm 'mesos-master-readwrite' I0127 00:39:33.123631 1815 http.cpp:922] Using default 'basic' HTTP authen= ticator for realm 'mesos-master-scheduler' I0127 00:39:33.123884 1815 master.cpp:587] Authorization enabled I0127 00:39:33.127008 1819 master.cpp:2119] Elected as the leading master! I0127 00:39:33.127084 1819 master.cpp:1641] Recovering from registrar I0127 00:39:33.127766 1818 registrar.cpp:362] Successfully fetched the reg= istry (0B) in 408832ns I0127 00:39:33.127883 1818 registrar.cpp:461] Applied 1 operations in 2209= 2ns; attempting to update the registry I0127 00:39:33.130798 1818 registrar.cpp:506] Successfully updated the reg= istry in 2.779136ms I0127 00:39:33.130934 1818 registrar.cpp:392] Successfully recovered regis= trar I0127 00:39:33.131573 1818 master.cpp:1757] Recovered 0 agents from the re= gistry (153B); allowing 10mins for agents to re-register I0127 00:39:33.134503 1809 cluster.cpp:446] Creating default 'local' autho= rizer I0127 00:39:33.135774 1818 slave.cpp:209] Mesos agent started on (8)@10.0.= 2.15:51845 I0127 00:39:33.135824 1818 slave.cpp:210] Flags at startup: --acls=3D"" --= appc_simple_discovery_uri_prefix=3D"http://" --appc_store_dir=3D"/tmp/mesos= /store/appc" --authenticate_http_readonly=3D"true" --authenticate_http_read= write=3D"true" --authenticatee=3D"crammd5" --authentication_backoff_factor= =3D"1secs" --authorizer=3D"local" --cgroups_cpu_enable_pids_and_tids_count= =3D"false" --cgroups_enable_cfs=3D"false" --cgroups_hierarchy=3D"/sys/fs/cg= roup" --cgroups_limit_swap=3D"false" --cgroups_root=3D"mesos" --container_d= isk_watch_interval=3D"15secs" --containerizers=3D"mesos" --credential=3D"/t= mp/MasterTest_MultipleExecutors_ruv9Vu/credential" --default_role=3D"*" --d= isk_watch_interval=3D"1mins" --docker=3D"docker" --docker_kill_orphans=3D"t= rue" --docker_registry=3D"https://registry-1.docker.io" --docker_remove_del= ay=3D"6hrs" --docker_socket=3D"/var/run/docker.sock" --docker_stop_timeout= =3D"0ns" --docker_store_dir=3D"/tmp/mesos/store/docker" --docker_volume_che= ckpoint_dir=3D"/var/run/mesos/isolators/docker/volume" --enforce_container_= disk_quota=3D"false" --executor_registration_timeout=3D"1mins" --executor_s= hutdown_grace_period=3D"5secs" --fetcher_cache_dir=3D"/tmp/MasterTest_Multi= pleExecutors_ruv9Vu/fetch" --fetcher_cache_size=3D"2GB" --frameworks_home= =3D"" --gc_delay=3D"1weeks" --gc_disk_headroom=3D"0.1" --hadoop_home=3D"" -= -help=3D"false" --hostname_lookup=3D"true" --http_authenticators=3D"basic" = --http_command_executor=3D"false" --http_credentials=3D"/tmp/MasterTest_Mul= tipleExecutors_ruv9Vu/http_credentials" --http_heartbeat_interval=3D"30secs= " --image_provisioner_backend=3D"copy" --initialize_driver_logging=3D"true"= --isolation=3D"posix/cpu,posix/mem" --launcher=3D"posix" --launcher_dir=3D= "/home/vagrant/src/mesos/build/src" --logbufsecs=3D"0" --logging_level=3D"I= NFO" --max_completed_executors_per_framework=3D"150" --oversubscribed_resou= rces_interval=3D"15secs" --perf_duration=3D"10secs" --perf_interval=3D"1min= s" --qos_correction_interval_min=3D"0ns" --quiet=3D"false" --recover=3D"rec= onnect" --recovery_timeout=3D"15mins" --registration_backoff_factor=3D"10ms= " --resources=3D"cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --re= vocable_cpu_low_priority=3D"true" --runtime_dir=3D"/tmp/MasterTest_Multiple= Executors_ruv9Vu" --sandbox_directory=3D"/mnt/mesos/sandbox" --strict=3D"tr= ue" --switch_user=3D"true" --systemd_enable_support=3D"true" --systemd_runt= ime_directory=3D"/run/systemd/system" --version=3D"false" --work_dir=3D"/tm= p/MasterTest_MultipleExecutors_1wuqbP" I0127 00:39:33.136175 1818 credentials.hpp:86] Loading credential for auth= entication from '/tmp/MasterTest_MultipleExecutors_ruv9Vu/credential' I0127 00:39:33.136325 1818 slave.cpp:352] Agent using credential for: test= -principal I0127 00:39:33.136358 1818 credentials.hpp:37] Loading credentials for aut= hentication from '/tmp/MasterTest_MultipleExecutors_ruv9Vu/http_credentials= ' I0127 00:39:33.136541 1818 http.cpp:922] Using default 'basic' HTTP authen= ticator for realm 'mesos-agent-readonly' I0127 00:39:33.138916 1818 http.cpp:922] Using default 'basic' HTTP authen= ticator for realm 'mesos-agent-readwrite' I0127 00:39:33.142987 1818 slave.cpp:539] Agent resources: cpus(*):2; mem(= *):1024; disk(*):1024; ports(*):[31000-32000] I0127 00:39:33.143088 1818 slave.cpp:547] Agent attributes: [ ] I0127 00:39:33.143151 1818 slave.cpp:552] Agent hostname: vagrant-ubuntu-t= rusty-64 I0127 00:39:33.143090 1809 sched.cpp:232] Version: 1.2.0 I0127 00:39:33.143712 1817 status_update_manager.cpp:177] Pausing sending = status updates I0127 00:39:33.144261 1817 sched.cpp:336] New master detected at master@10= .0.2.15:51845 I0127 00:39:33.144701 1817 sched.cpp:407] Authenticating with master maste= r@10.0.2.15:51845 I0127 00:39:33.144754 1817 sched.cpp:414] Using default CRAM-MD5 authentic= atee I0127 00:39:33.144836 1819 state.cpp:60] Recovering state from '/tmp/Maste= rTest_MultipleExecutors_1wuqbP/meta' I0127 00:39:33.145293 1819 status_update_manager.cpp:203] Recovering statu= s update manager I0127 00:39:33.145570 1814 authenticatee.cpp:121] Creating new client SASL= connection I0127 00:39:33.146090 1814 master.cpp:6842] Authenticating scheduler-9d9e5= 4ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 I0127 00:39:33.146564 1817 slave.cpp:5422] Finished recovery I0127 00:39:33.147352 1814 authenticator.cpp:98] Creating new server SASL = connection I0127 00:39:33.148704 1815 authenticatee.cpp:213] Received SASL authentica= tion mechanisms: CRAM-MD5 I0127 00:39:33.149062 1815 authenticatee.cpp:239] Attempting to authentica= te with mechanism 'CRAM-MD5' I0127 00:39:33.149545 1815 authenticator.cpp:204] Received SASL authentica= tion start I0127 00:39:33.150210 1815 authenticator.cpp:326] Authentication requires = more steps I0127 00:39:33.152232 1815 authenticatee.cpp:259] Received SASL authentica= tion step I0127 00:39:33.152844 1814 slave.cpp:929] New master detected at master@10= .0.2.15:51845 I0127 00:39:33.153264 1820 status_update_manager.cpp:177] Pausing sending = status updates I0127 00:39:33.153064 1815 authenticator.cpp:232] Received SASL authentica= tion step I0127 00:39:33.153442 1814 slave.cpp:964] Detecting new master I0127 00:39:33.153686 1815 authenticator.cpp:318] Authentication success I0127 00:39:33.154338 1813 authenticatee.cpp:299] Authentication success I0127 00:39:33.154717 1818 master.cpp:6872] Successfully authenticated pri= ncipal 'test-principal' at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@1= 0.0.2.15:51845 I0127 00:39:33.155275 1814 sched.cpp:513] Successfully authenticated with = master master@10.0.2.15:51845 I0127 00:39:33.155483 1819 master.cpp:2707] Received SUBSCRIBE call for fr= amework 'default' at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.= 15:51845 I0127 00:39:33.155555 1819 master.cpp:2155] Authorizing framework principa= l 'test-principal' to receive offers for role '*' I0127 00:39:33.156003 1819 master.cpp:2783] Subscribing framework default = with checkpointing disabled and capabilities [ ] I0127 00:39:33.156581 1814 hierarchical.cpp:271] Added framework ac440d30-= 722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.156581 1819 sched.cpp:759] Framework registered with ac440d= 30-722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.163875 1818 slave.cpp:991] Authenticating with master maste= r@10.0.2.15:51845 I0127 00:39:33.163997 1818 slave.cpp:1002] Using default CRAM-MD5 authenti= catee I0127 00:39:33.164427 1818 authenticatee.cpp:121] Creating new client SASL= connection I0127 00:39:33.164808 1818 master.cpp:6842] Authenticating slave(8)@10.0.2= .15:51845 I0127 00:39:33.165102 1818 authenticator.cpp:98] Creating new server SASL = connection I0127 00:39:33.165536 1818 authenticatee.cpp:213] Received SASL authentica= tion mechanisms: CRAM-MD5 I0127 00:39:33.165603 1818 authenticatee.cpp:239] Attempting to authentica= te with mechanism 'CRAM-MD5' I0127 00:39:33.165796 1813 authenticator.cpp:204] Received SASL authentica= tion start I0127 00:39:33.165879 1813 authenticator.cpp:326] Authentication requires = more steps I0127 00:39:33.165999 1813 authenticatee.cpp:259] Received SASL authentica= tion step I0127 00:39:33.166175 1816 authenticator.cpp:232] Received SASL authentica= tion step I0127 00:39:33.166364 1816 authenticator.cpp:318] Authentication success I0127 00:39:33.166671 1813 master.cpp:6872] Successfully authenticated pri= ncipal 'test-principal' at slave(8)@10.0.2.15:51845 I0127 00:39:33.166739 1816 authenticatee.cpp:299] Authentication success I0127 00:39:33.167352 1817 slave.cpp:1086] Successfully authenticated with= master master@10.0.2.15:51845 I0127 00:39:33.167836 1816 master.cpp:5232] Registering agent at slave(8)@= 10.0.2.15:51845 (vagrant-ubuntu-trusty-64) with id ac440d30-722b-43a5-9f61-= cea98b3e576a-S0 I0127 00:39:33.168298 1816 registrar.cpp:461] Applied 1 operations in 6273= 2ns; attempting to update the registry I0127 00:39:33.169097 1820 registrar.cpp:506] Successfully updated the reg= istry in 716032ns I0127 00:39:33.170994 1813 master.cpp:5303] Registered agent ac440d30-722b= -43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trus= ty-64) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0127 00:39:33.171192 1815 slave.cpp:1132] Registered with master master@1= 0.0.2.15:51845; given agent ID ac440d30-722b-43a5-9f61-cea98b3e576a-S0 I0127 00:39:33.173738 1814 status_update_manager.cpp:184] Resuming sending= status updates I0127 00:39:33.174046 1815 slave.cpp:1198] Forwarding total oversubscribed= resources {} I0127 00:39:33.174124 1817 hierarchical.cpp:478] Added agent ac440d30-722b= -43a5-9f61-cea98b3e576a-S0 (vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*= ):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {}) I0127 00:39:33.174309 1815 master.cpp:5710] Received update of agent ac440= d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubu= ntu-trusty-64) with total oversubscribed resources {} I0127 00:39:33.176139 1817 hierarchical.cpp:548] Agent ac440d30-722b-43a5-= 9f61-cea98b3e576a-S0 (vagrant-ubuntu-trusty-64) updated with oversubscribed= resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000= -32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-3= 2000]) I0127 00:39:33.176378 1814 master.cpp:6671] Sending 1 offers to framework = ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at scheduler-9d9e54ce-c= 21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 I0127 00:39:33.178370 1818 master.cpp:3661] Processing ACCEPT call for off= ers: [ ac440d30-722b-43a5-9f61-cea98b3e576a-O0 ] on agent ac440d30-722b-43a= 5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-6= 4) for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at sch= eduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 I0127 00:39:33.178455 1818 master.cpp:3249] Authorizing framework principa= l 'test-principal' to launch task 1 I0127 00:39:33.178591 1818 master.cpp:3249] Authorizing framework principa= l 'test-principal' to launch task 2 W0127 00:39:33.181143 1814 validation.cpp:995] Executor 'executor-1' for t= ask '1' uses less CPUs (None) than the minimum required (0.01). Please upda= te your executor, as this will be mandatory in future releases. W0127 00:39:33.181447 1814 validation.cpp:1007] Executor 'executor-1' for = task '1' uses less memory (None) than the minimum required (32MB). Please u= pdate your executor, as this will be mandatory in future releases. I0127 00:39:33.181901 1814 master.cpp:8584] Adding task 1 with resources c= pus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at sl= ave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64) I0127 00:39:33.182237 1814 master.cpp:4311] Launching task 1 of framework = ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at scheduler-9d9e54ce-c= 21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 with resources cpus(*):1; mem(*)= :512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15= :51845 (vagrant-ubuntu-trusty-64) I0127 00:39:33.182725 1815 slave.cpp:1576] Got assigned task '1' for frame= work ac440d30-722b-43a5-9f61-cea98b3e576a-0000 W0127 00:39:33.183140 1814 validation.cpp:995] Executor 'executor-2' for t= ask '2' uses less CPUs (None) than the minimum required (0.01). Please upda= te your executor, as this will be mandatory in future releases. W0127 00:39:33.183409 1814 validation.cpp:1007] Executor 'executor-2' for = task '2' uses less memory (None) than the minimum required (32MB). Please u= pdate your executor, as this will be mandatory in future releases. I0127 00:39:33.183221 1815 slave.cpp:1736] Launching task '1' for framewor= k ac440d30-722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.184008 1815 paths.cpp:547] Trying to chown '/tmp/MasterTest= _MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/fr= ameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-1/run= s/d1f9a0da-39af-4264-8679-6feeb54a9bd2' to user 'vagrant' I0127 00:39:33.184008 1814 master.cpp:8584] Adding task 2 with resources c= pus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at sl= ave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64) I0127 00:39:33.184370 1815 slave.cpp:6350] Launching executor 'executor-1'= of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 with resources {} i= n work directory '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-= 722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e5= 76a-0000/executors/executor-1/runs/d1f9a0da-39af-4264-8679-6feeb54a9bd2' I0127 00:39:33.184882 1814 master.cpp:4311] Launching task 2 of framework = ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at scheduler-9d9e54ce-c= 21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 with resources cpus(*):1; mem(*)= :512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15= :51845 (vagrant-ubuntu-trusty-64) I0127 00:39:33.185616 1815 slave.cpp:2058] Queued task '1' for executor 'e= xecutor-1' of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.185811 1815 slave.cpp:1576] Got assigned task '2' for frame= work ac440d30-722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.186208 1815 slave.cpp:1736] Launching task '2' for framewor= k ac440d30-722b-43a5-9f61-cea98b3e576a-0000 I0127 00:39:33.186472 1815 paths.cpp:547] Trying to chown '/tmp/MasterTest= _MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/fr= ameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-2/run= s/f1c7564c-d22a-4609-942c-b53f77061d99' to user 'vagrant' I0127 00:39:33.187806 1815 slave.cpp:6350] Launching executor 'executor-2'= of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 with resources {} i= n work directory '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-= 722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e5= 76a-0000/executors/executor-2/runs/f1c7564c-d22a-4609-942c-b53f77061d99' Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe711d700 (LWP 1815)] __GI_getenv (name=3D0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85 85=09getenv.c: No such file or directory. (gdb) inf locals ep_start =3D len =3D 11 ep =3D 0x2da66c0 name_start =3D 18764 (gdb) bt #0 __GI_getenv (name=3D0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85 #1 0x0000000000affbce in os::getenv () #2 0x00007ffff5a8fe91 in mesos::internal::slave::executorEnvironment () fr= om /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so #3 0x00007ffff5a8ad9a in mesos::internal::slave::Framework::launchExecutor= () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so #4 0x00007ffff5a65a47 in mesos::internal::slave::Slave::_run () from /home= /vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so #5 0x00007ffff5abdc0d in void process::dispatch const&, mesos::FrameworkInfo const&, mesos::Exec= utorInfo const&, Option const&, Option const&, process::Future, mesos::FrameworkInfo, mesos::ExecutorInfo= , Option, Option >(process::PID const&, void (mesos::internal::slave::Slave::*)(= process::Future const&, mesos::FrameworkInfo const&, mesos::ExecutorI= nfo const&, Option const&, Option co= nst&), process::Future, mesos::FrameworkInfo, mesos::ExecutorInfo, Op= tion, Option)::{lambda(process::Proc= essBase*)#1}::operator()(process::ProcessBase*) const () from /home/vagrant= /src/mesos/build/src/.libs/libmesos-1.2.0.so #6 0x00007ffff5af1de9 in std::_Function_handler const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo const&, Optio= n const&, Option const&, process::Fu= ture, mesos::FrameworkInfo, mesos::ExecutorInfo, Option, Option >(process::PID const&, void (mesos::internal::slave::Slave::*)(process::Future = const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo const&, Option const&, Option const&), process::Future= , mesos::FrameworkInfo, mesos::ExecutorInfo, Option,= Option)::{lambda(process::ProcessBase*)#1}>::_M_invo= ke(std::_Any_data const&, process::ProcessBase*) () from /home/vagrant/src/= mesos/build/src/.libs/libmesos-1.2.0.so #7 0x00007ffff67e3a2b in std::function::oper= ator()(process::ProcessBase*) const () from /home/vagrant/src/mesos/build/s= rc/.libs/libmesos-1.2.0.so #8 0x00007ffff67c982d in process::ProcessBase::visit () from /home/vagrant= /src/mesos/build/src/.libs/libmesos-1.2.0.so #9 0x00007ffff67d40ac in process::DispatchEvent::visit () from /home/vagra= nt/src/mesos/build/src/.libs/libmesos-1.2.0.so #10 0x0000000000ad3f14 in process::ProcessBase::serve () #11 0x00007ffff67c5b1a in process::ProcessManager::resume () from /home/vag= rant/src/mesos/build/src/.libs/libmesos-1.2.0.so #12 0x00007ffff67c235e in operator() () from /home/vagrant/src/mesos/build/= src/.libs/libmesos-1.2.0.so #13 0x00007ffff67d37e6 in _M_invoke<>(void) () from /home/vagrant/src/mesos= /build/src/.libs/libmesos-1.2.0.so #14 0x00007ffff67d373d in operator() () from /home/vagrant/src/mesos/build/= src/.libs/libmesos-1.2.0.so #15 0x00007ffff67d36d6 in _M_run () from /home/vagrant/src/mesos/build/src/= .libs/libmesos-1.2.0.so #16 0x00007ffff096ea60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so= .6 #17 0x00007ffff018b184 in start_thread (arg=3D0x7fffe711d700) at pthread_cr= eate.c:312 #18 0x00007fffefeb837d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clo= ne.S:111 {code} If we look at {{getenv.c}}, we find the following: {code} 26 /* Return the value of the environment variable NAME. This implementat= ion 27 is tuned a bit in that it assumes no environment variable has an emp= ty 28 name which of course should always be true. We have a special case = for 29 one character names so that for the general case we can assume at le= ast 30 two characters which we can access. By doing this we can avoid usin= g the 31 `strncmp' most of the time. */ 32 char * 33 getenv (name) 34 const char *name; 35 { 36 size_t len =3D strlen (name); 37 char **ep; 38 uint16_t name_start; 39 40 if (__environ =3D=3D NULL || name[0] =3D=3D '\0') 41 return NULL; 42 43 if (name[1] =3D=3D '\0') 44 { 45 /* The name of the variable consists of only one character. Ther= efore 46 the first two characters of the environment entry are this character 47 and a '=3D' character. */ 48 #if __BYTE_ORDER =3D=3D __LITTLE_ENDIAN || !_STRING_ARCH_unaligned 49 name_start =3D ('=3D' << 8) | *(const unsigned char *) name; 50 #else 51 name_start =3D '=3D' | ((*(const unsigned char *) name) << 8); 52 #endif 53 for (ep =3D __environ; *ep !=3D NULL; ++ep) 54 { 55 #if _STRING_ARCH_unaligned 56 uint16_t ep_start =3D *(uint16_t *) *ep; 57 #else 58 uint16_t ep_start =3D (((unsigned char *) *ep)[0] 59 | (((unsigned char *) *ep)[1] << 8)); 60 #endif 61 if (name_start =3D=3D ep_start) 62 return &(*ep)[2]; 63 } 64 } 65 else 66 { 67 #if _STRING_ARCH_unaligned 68 name_start =3D *(const uint16_t *) name; 69 #else 70 name_start =3D (((const unsigned char *) name)[0] 71 | (((const unsigned char *) name)[1] << 8)); 72 #endif 73 len -=3D 2; 74 name +=3D 2; 75 76 for (ep =3D __environ; *ep !=3D NULL; ++ep) 77 { 78 #if _STRING_ARCH_unaligned 79 uint16_t ep_start =3D *(uint16_t *) *ep; 80 #else 81 uint16_t ep_start =3D (((unsigned char *) *ep)[0] 82 | (((unsigned char *) *ep)[1] << 8)); 83 #endif 84 85 if (name_start =3D=3D ep_start && !strncmp (*ep + 2, name, len) 86 && (*ep)[len + 2] =3D=3D '=3D') 87 return &(*ep)[len + 3]; 88 } 89 } 90 91 return NULL; 92 } 93 libc_hidden_def (getenv) {code} Sure enough, at line 85 we are attempting to read {{ep_start}}, which is po= inting to a place in memory somewhere in the array pointed to by the global= {{__environ}}. When we create a subprocess, we pass {{char** envp}} direct= ly from the parent process into the cloned process, and then temporarily re= assign the child process's {{environ}} pointer while we perform {{execvp}}: {code} inline int execvpe(const char* file, char** argv, char** envp) { char** saved =3D os::raw::environment(); *os::raw::environmentp() =3D envp; int result =3D execvp(file, argv); *os::raw::environmentp() =3D saved; return result; } {code} > os::getenv() can segfault > ------------------------- > > Key: MESOS-6985 > URL: https://issues.apache.org/jira/browse/MESOS-6985 > Project: Mesos > Issue Type: Bug > Components: stout > Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and with= out libevent/SSL > Reporter: Greg Mann > Labels: stout > Attachments: MasterMaintenanceTest.InverseOffersFilters-truncated= .txt, MasterTest.MultipleExecutors.txt > > > This was observed on ASF CI. The segfault first showed up on CI on 9/20/1= 6 and has been produced by the tests {{MasterTest.MultipleExecutors}} and {= {MasterMaintenanceTest.InverseOffersFilters}}. In both cases, {{os::getenv(= )}} segfaults with the same stack trace: > {code} > *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you ar= e using GNU date *** > PC: @ 0x2ad59e3ae82d (unknown) > I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0 > *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 2= 40; stack trace: *** > I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: executor(7= 5)@172.17.0.2:45752 with pid 28591 > @ 0x2ad5ab953197 (unknown) > @ 0x2ad5ab957479 (unknown) > @ 0x2ad59e165330 (unknown) > @ 0x2ad59e3ae82d (unknown) > @ 0x2ad594631358 os::getenv() > @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment() > @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecuto= r() > @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run() > @ 0x2ad59ac1ec10 _ZZN7process8dispatchIN5mesos8internal5slave5Sla= veERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1= _8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_= FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x2ad59ac1e6bf _ZNSt17_Function_handlerIFvPN7process11ProcessBa= seEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13Fr= ameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskG= roupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T= 8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2ad59bce2304 std::function<>::operator()() > @ 0x2ad59bcc9824 process::ProcessBase::visit() > @ 0x2ad59bd4028e process::DispatchEvent::visit() > @ 0x2ad594616df1 process::ProcessBase::serve() > @ 0x2ad59bcc72b7 process::ProcessManager::resume() > @ 0x2ad59bcd567c process::ProcessManager::init_threads()::$_2::op= erator()() > @ 0x2ad59bcd5585 _ZNSt12_Bind_simpleIFZN7process14ProcessManager1= 2init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2ad59bcd5555 std::_Bind_simple<>::operator()() > @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run() > @ 0x2ad59d9e6a60 (unknown) > @ 0x2ad59e15d184 start_thread > @ 0x2ad59e46d37d (unknown) > make[4]: *** [check-local] Segmentation fault > {code} > Find attached the full log from a failed run of {{MasterTest.MultipleExec= utors}} and a truncated log from a failed run of {{MasterMaintenanceTest.In= verseOffersFilters}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)