mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6985) os::getenv() can segfault
Date Fri, 27 Jan 2017 00:55:24 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840771#comment-15840771
] 

Greg Mann commented on MESOS-6985:
----------------------------------

Yep, it's definitely occurring in {{::getenv}}. Here's the result of a failed test run within
{{gdb}}:
{code}
[ RUN      ] MasterTest.MultipleExecutors
I0127 00:39:33.120487  1809 cluster.cpp:160] Creating default 'local' authorizer
I0127 00:39:33.122427  1815 master.cpp:383] Master ac440d30-722b-43a5-9f61-cea98b3e576a (vagrant-ubuntu-trusty-64)
started on 10.0.2.15:51845
I0127 00:39:33.122498  1815 master.cpp:385] Flags at startup: --acls="" --agent_ping_timeout="15secs"
--agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true"
--authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/b7WHq9/credentials" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins"
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf"
--version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/b7WHq9/master"
--zk_session_timeout="10secs"
I0127 00:39:33.122836  1815 master.cpp:435] Master only allowing authenticated frameworks
to register
I0127 00:39:33.122858  1815 master.cpp:449] Master only allowing authenticated agents to register
I0127 00:39:33.122875  1815 master.cpp:462] Master only allowing authenticated HTTP frameworks
to register
I0127 00:39:33.122891  1815 credentials.hpp:37] Loading credentials for authentication from
'/tmp/b7WHq9/credentials'
I0127 00:39:33.123128  1815 master.cpp:507] Using default 'crammd5' authenticator
I0127 00:39:33.123265  1815 http.cpp:922] Using default 'basic' HTTP authenticator for realm
'mesos-master-readonly'
I0127 00:39:33.123394  1815 http.cpp:922] Using default 'basic' HTTP authenticator for realm
'mesos-master-readwrite'
I0127 00:39:33.123631  1815 http.cpp:922] Using default 'basic' HTTP authenticator for realm
'mesos-master-scheduler'
I0127 00:39:33.123884  1815 master.cpp:587] Authorization enabled
I0127 00:39:33.127008  1819 master.cpp:2119] Elected as the leading master!
I0127 00:39:33.127084  1819 master.cpp:1641] Recovering from registrar
I0127 00:39:33.127766  1818 registrar.cpp:362] Successfully fetched the registry (0B) in 408832ns
I0127 00:39:33.127883  1818 registrar.cpp:461] Applied 1 operations in 22092ns; attempting
to update the registry
I0127 00:39:33.130798  1818 registrar.cpp:506] Successfully updated the registry in 2.779136ms
I0127 00:39:33.130934  1818 registrar.cpp:392] Successfully recovered registrar
I0127 00:39:33.131573  1818 master.cpp:1757] Recovered 0 agents from the registry (153B);
allowing 10mins for agents to re-register
I0127 00:39:33.134503  1809 cluster.cpp:446] Creating default 'local' authorizer
I0127 00:39:33.135774  1818 slave.cpp:209] Mesos agent started on (8)@10.0.2.15:51845
I0127 00:39:33.135824  1818 slave.cpp:210] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://"
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true"
--authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_MultipleExecutors_ruv9Vu/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/MasterTest_MultipleExecutors_ruv9Vu/fetch" --fetcher_cache_size="2GB"
--frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false"
--hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/MasterTest_MultipleExecutors_ruv9Vu/http_credentials"
--http_heartbeat_interval="30secs" --image_provisioner_backend="copy" --initialize_driver_logging="true"
--isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/home/vagrant/src/mesos/build/src"
--logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs"
--perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true" --runtime_dir="/tmp/MasterTest_MultipleExecutors_ruv9Vu"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterTest_MultipleExecutors_1wuqbP"
I0127 00:39:33.136175  1818 credentials.hpp:86] Loading credential for authentication from
'/tmp/MasterTest_MultipleExecutors_ruv9Vu/credential'
I0127 00:39:33.136325  1818 slave.cpp:352] Agent using credential for: test-principal
I0127 00:39:33.136358  1818 credentials.hpp:37] Loading credentials for authentication from
'/tmp/MasterTest_MultipleExecutors_ruv9Vu/http_credentials'
I0127 00:39:33.136541  1818 http.cpp:922] Using default 'basic' HTTP authenticator for realm
'mesos-agent-readonly'
I0127 00:39:33.138916  1818 http.cpp:922] Using default 'basic' HTTP authenticator for realm
'mesos-agent-readwrite'
I0127 00:39:33.142987  1818 slave.cpp:539] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
I0127 00:39:33.143088  1818 slave.cpp:547] Agent attributes: [  ]
I0127 00:39:33.143151  1818 slave.cpp:552] Agent hostname: vagrant-ubuntu-trusty-64
I0127 00:39:33.143090  1809 sched.cpp:232] Version: 1.2.0
I0127 00:39:33.143712  1817 status_update_manager.cpp:177] Pausing sending status updates
I0127 00:39:33.144261  1817 sched.cpp:336] New master detected at master@10.0.2.15:51845
I0127 00:39:33.144701  1817 sched.cpp:407] Authenticating with master master@10.0.2.15:51845
I0127 00:39:33.144754  1817 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0127 00:39:33.144836  1819 state.cpp:60] Recovering state from '/tmp/MasterTest_MultipleExecutors_1wuqbP/meta'
I0127 00:39:33.145293  1819 status_update_manager.cpp:203] Recovering status update manager
I0127 00:39:33.145570  1814 authenticatee.cpp:121] Creating new client SASL connection
I0127 00:39:33.146090  1814 master.cpp:6842] Authenticating scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845
I0127 00:39:33.146564  1817 slave.cpp:5422] Finished recovery
I0127 00:39:33.147352  1814 authenticator.cpp:98] Creating new server SASL connection
I0127 00:39:33.148704  1815 authenticatee.cpp:213] Received SASL authentication mechanisms:
CRAM-MD5
I0127 00:39:33.149062  1815 authenticatee.cpp:239] Attempting to authenticate with mechanism
'CRAM-MD5'
I0127 00:39:33.149545  1815 authenticator.cpp:204] Received SASL authentication start
I0127 00:39:33.150210  1815 authenticator.cpp:326] Authentication requires more steps
I0127 00:39:33.152232  1815 authenticatee.cpp:259] Received SASL authentication step
I0127 00:39:33.152844  1814 slave.cpp:929] New master detected at master@10.0.2.15:51845
I0127 00:39:33.153264  1820 status_update_manager.cpp:177] Pausing sending status updates
I0127 00:39:33.153064  1815 authenticator.cpp:232] Received SASL authentication step
I0127 00:39:33.153442  1814 slave.cpp:964] Detecting new master
I0127 00:39:33.153686  1815 authenticator.cpp:318] Authentication success
I0127 00:39:33.154338  1813 authenticatee.cpp:299] Authentication success
I0127 00:39:33.154717  1818 master.cpp:6872] Successfully authenticated principal 'test-principal'
at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845
I0127 00:39:33.155275  1814 sched.cpp:513] Successfully authenticated with master master@10.0.2.15:51845
I0127 00:39:33.155483  1819 master.cpp:2707] Received SUBSCRIBE call for framework 'default'
at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845
I0127 00:39:33.155555  1819 master.cpp:2155] Authorizing framework principal 'test-principal'
to receive offers for role '*'
I0127 00:39:33.156003  1819 master.cpp:2783] Subscribing framework default with checkpointing
disabled and capabilities [  ]
I0127 00:39:33.156581  1814 hierarchical.cpp:271] Added framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.156581  1819 sched.cpp:759] Framework registered with ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.163875  1818 slave.cpp:991] Authenticating with master master@10.0.2.15:51845
I0127 00:39:33.163997  1818 slave.cpp:1002] Using default CRAM-MD5 authenticatee
I0127 00:39:33.164427  1818 authenticatee.cpp:121] Creating new client SASL connection
I0127 00:39:33.164808  1818 master.cpp:6842] Authenticating slave(8)@10.0.2.15:51845
I0127 00:39:33.165102  1818 authenticator.cpp:98] Creating new server SASL connection
I0127 00:39:33.165536  1818 authenticatee.cpp:213] Received SASL authentication mechanisms:
CRAM-MD5
I0127 00:39:33.165603  1818 authenticatee.cpp:239] Attempting to authenticate with mechanism
'CRAM-MD5'
I0127 00:39:33.165796  1813 authenticator.cpp:204] Received SASL authentication start
I0127 00:39:33.165879  1813 authenticator.cpp:326] Authentication requires more steps
I0127 00:39:33.165999  1813 authenticatee.cpp:259] Received SASL authentication step
I0127 00:39:33.166175  1816 authenticator.cpp:232] Received SASL authentication step
I0127 00:39:33.166364  1816 authenticator.cpp:318] Authentication success
I0127 00:39:33.166671  1813 master.cpp:6872] Successfully authenticated principal 'test-principal'
at slave(8)@10.0.2.15:51845
I0127 00:39:33.166739  1816 authenticatee.cpp:299] Authentication success
I0127 00:39:33.167352  1817 slave.cpp:1086] Successfully authenticated with master master@10.0.2.15:51845
I0127 00:39:33.167836  1816 master.cpp:5232] Registering agent at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64) with id ac440d30-722b-43a5-9f61-cea98b3e576a-S0
I0127 00:39:33.168298  1816 registrar.cpp:461] Applied 1 operations in 62732ns; attempting
to update the registry
I0127 00:39:33.169097  1820 registrar.cpp:506] Successfully updated the registry in 716032ns
I0127 00:39:33.170994  1813 master.cpp:5303] Registered agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0
at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
I0127 00:39:33.171192  1815 slave.cpp:1132] Registered with master master@10.0.2.15:51845;
given agent ID ac440d30-722b-43a5-9f61-cea98b3e576a-S0
I0127 00:39:33.173738  1814 status_update_manager.cpp:184] Resuming sending status updates
I0127 00:39:33.174046  1815 slave.cpp:1198] Forwarding total oversubscribed resources {}
I0127 00:39:33.174124  1817 hierarchical.cpp:478] Added agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0
(vagrant-ubuntu-trusty-64) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
(allocated: {})
I0127 00:39:33.174309  1815 master.cpp:5710] Received update of agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0
at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64) with total oversubscribed resources
{}
I0127 00:39:33.176139  1817 hierarchical.cpp:548] Agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0
(vagrant-ubuntu-trusty-64) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
I0127 00:39:33.176378  1814 master.cpp:6671] Sending 1 offers to framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
(default) at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845
I0127 00:39:33.178370  1818 master.cpp:3661] Processing ACCEPT call for offers: [ ac440d30-722b-43a5-9f61-cea98b3e576a-O0
] on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000 (default) at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845
I0127 00:39:33.178455  1818 master.cpp:3249] Authorizing framework principal 'test-principal'
to launch task 1
I0127 00:39:33.178591  1818 master.cpp:3249] Authorizing framework principal 'test-principal'
to launch task 2
W0127 00:39:33.181143  1814 validation.cpp:995] Executor 'executor-1' for task '1' uses less
CPUs (None) than the minimum required (0.01). Please update your executor, as this will be
mandatory in future releases.
W0127 00:39:33.181447  1814 validation.cpp:1007] Executor 'executor-1' for task '1' uses less
memory (None) than the minimum required (32MB). Please update your executor, as this will
be mandatory in future releases.
I0127 00:39:33.181901  1814 master.cpp:8584] Adding task 1 with resources cpus(*):1; mem(*):512
on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.182237  1814 master.cpp:4311] Launching task 1 of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
(default) at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64)
I0127 00:39:33.182725  1815 slave.cpp:1576] Got assigned task '1' for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
W0127 00:39:33.183140  1814 validation.cpp:995] Executor 'executor-2' for task '2' uses less
CPUs (None) than the minimum required (0.01). Please update your executor, as this will be
mandatory in future releases.
W0127 00:39:33.183409  1814 validation.cpp:1007] Executor 'executor-2' for task '2' uses less
memory (None) than the minimum required (32MB). Please update your executor, as this will
be mandatory in future releases.
I0127 00:39:33.183221  1815 slave.cpp:1736] Launching task '1' for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.184008  1815 paths.cpp:547] Trying to chown '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-1/runs/d1f9a0da-39af-4264-8679-6feeb54a9bd2'
to user 'vagrant'
I0127 00:39:33.184008  1814 master.cpp:8584] Adding task 2 with resources cpus(*):1; mem(*):512
on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845 (vagrant-ubuntu-trusty-64)
I0127 00:39:33.184370  1815 slave.cpp:6350] Launching executor 'executor-1' of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
with resources {} in work directory '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-1/runs/d1f9a0da-39af-4264-8679-6feeb54a9bd2'
I0127 00:39:33.184882  1814 master.cpp:4311] Launching task 2 of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
(default) at scheduler-9d9e54ce-c21e-408b-9277-7fb55c3ea844@10.0.2.15:51845 with resources
cpus(*):1; mem(*):512 on agent ac440d30-722b-43a5-9f61-cea98b3e576a-S0 at slave(8)@10.0.2.15:51845
(vagrant-ubuntu-trusty-64)
I0127 00:39:33.185616  1815 slave.cpp:2058] Queued task '1' for executor 'executor-1' of framework
ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.185811  1815 slave.cpp:1576] Got assigned task '2' for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.186208  1815 slave.cpp:1736] Launching task '2' for framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
I0127 00:39:33.186472  1815 paths.cpp:547] Trying to chown '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-2/runs/f1c7564c-d22a-4609-942c-b53f77061d99'
to user 'vagrant'
I0127 00:39:33.187806  1815 slave.cpp:6350] Launching executor 'executor-2' of framework ac440d30-722b-43a5-9f61-cea98b3e576a-0000
with resources {} in work directory '/tmp/MasterTest_MultipleExecutors_1wuqbP/slaves/ac440d30-722b-43a5-9f61-cea98b3e576a-S0/frameworks/ac440d30-722b-43a5-9f61-cea98b3e576a-0000/executors/executor-2/runs/f1c7564c-d22a-4609-942c-b53f77061d99'

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe711d700 (LWP 1815)]
__GI_getenv (name=0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85
85	getenv.c: No such file or directory.
(gdb) inf locals
ep_start = <error reading variable ep_start (Cannot access memory at address 0x110)>
len = 11
ep = 0x2da66c0
name_start = 18764
(gdb) bt
#0  __GI_getenv (name=0x7fffc0064e6a "BPROCESS_IP") at getenv.c:85
#1  0x0000000000affbce in os::getenv ()
#2  0x00007ffff5a8fe91 in mesos::internal::slave::executorEnvironment () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#3  0x00007ffff5a8ad9a in mesos::internal::slave::Framework::launchExecutor () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#4  0x00007ffff5a65a47 in mesos::internal::slave::Slave::_run () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#5  0x00007ffff5abdc0d in void process::dispatch<mesos::internal::slave::Slave, process::Future<bool>
const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo const&, Option<mesos::TaskInfo>
const&, Option<mesos::TaskGroupInfo> const&, process::Future<bool>, mesos::FrameworkInfo,
mesos::ExecutorInfo, Option<mesos::TaskInfo>, Option<mesos::TaskGroupInfo> >(process::PID<mesos::internal::slave::Slave>
const&, void (mesos::internal::slave::Slave::*)(process::Future<bool> const&,
mesos::FrameworkInfo const&, mesos::ExecutorInfo const&, Option<mesos::TaskInfo>
const&, Option<mesos::TaskGroupInfo> const&), process::Future<bool>, mesos::FrameworkInfo,
mesos::ExecutorInfo, Option<mesos::TaskInfo>, Option<mesos::TaskGroupInfo>)::{lambda(process::ProcessBase*)#1}::operator()(process::ProcessBase*)
const () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#6  0x00007ffff5af1de9 in std::_Function_handler<void (process::ProcessBase*), void process::dispatch<mesos::internal::slave::Slave,
process::Future<bool> const&, mesos::FrameworkInfo const&, mesos::ExecutorInfo
const&, Option<mesos::TaskInfo> const&, Option<mesos::TaskGroupInfo> const&,
process::Future<bool>, mesos::FrameworkInfo, mesos::ExecutorInfo, Option<mesos::TaskInfo>,
Option<mesos::TaskGroupInfo> >(process::PID<mesos::internal::slave::Slave>
const&, void (mesos::internal::slave::Slave::*)(process::Future<bool> const&,
mesos::FrameworkInfo const&, mesos::ExecutorInfo const&, Option<mesos::TaskInfo>
const&, Option<mesos::TaskGroupInfo> const&), process::Future<bool>, mesos::FrameworkInfo,
mesos::ExecutorInfo, Option<mesos::TaskInfo>, Option<mesos::TaskGroupInfo>)::{lambda(process::ProcessBase*)#1}>::_M_invoke(std::_Any_data
const&, process::ProcessBase*) () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#7  0x00007ffff67e3a2b in std::function<void (process::ProcessBase*)>::operator()(process::ProcessBase*)
const () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#8  0x00007ffff67c982d in process::ProcessBase::visit () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#9  0x00007ffff67d40ac in process::DispatchEvent::visit () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#10 0x0000000000ad3f14 in process::ProcessBase::serve ()
#11 0x00007ffff67c5b1a in process::ProcessManager::resume () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#12 0x00007ffff67c235e in operator() () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#13 0x00007ffff67d37e6 in _M_invoke<>(void) () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#14 0x00007ffff67d373d in operator() () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#15 0x00007ffff67d36d6 in _M_run () from /home/vagrant/src/mesos/build/src/.libs/libmesos-1.2.0.so
#16 0x00007ffff096ea60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff018b184 in start_thread (arg=0x7fffe711d700) at pthread_create.c:312
#18 0x00007fffefeb837d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{code}

If we look at {{getenv.c}}, we find the following:
{code}
 26 /* Return the value of the environment variable NAME.  This implementation
 27    is tuned a bit in that it assumes no environment variable has an empty
 28    name which of course should always be true.  We have a special case for
 29    one character names so that for the general case we can assume at least
 30    two characters which we can access.  By doing this we can avoid using the
 31    `strncmp' most of the time.  */
 32 char *
 33 getenv (name)
 34      const char *name;
 35 {
 36   size_t len = strlen (name);
 37   char **ep;
 38   uint16_t name_start;
 39
 40   if (__environ == NULL || name[0] == '\0')
 41     return NULL;
 42
 43   if (name[1] == '\0')
 44     {
 45       /* The name of the variable consists of only one character.  Therefore
 46    the first two characters of the environment entry are this character
 47    and a '=' character.  */
 48 #if __BYTE_ORDER == __LITTLE_ENDIAN || !_STRING_ARCH_unaligned
 49       name_start = ('=' << 8) | *(const unsigned char *) name;
 50 #else
 51       name_start = '=' | ((*(const unsigned char *) name) << 8);
 52 #endif
 53       for (ep = __environ; *ep != NULL; ++ep)
 54   {
 55 #if _STRING_ARCH_unaligned
 56     uint16_t ep_start = *(uint16_t *) *ep;
 57 #else
 58     uint16_t ep_start = (((unsigned char *) *ep)[0]
 59              | (((unsigned char *) *ep)[1] << 8));
 60 #endif
 61     if (name_start == ep_start)
 62       return &(*ep)[2];
 63   }
 64     }
 65   else
 66     {
 67 #if _STRING_ARCH_unaligned
 68       name_start = *(const uint16_t *) name;
 69 #else
 70       name_start = (((const unsigned char *) name)[0]
 71         | (((const unsigned char *) name)[1] << 8));
 72 #endif
 73       len -= 2;
 74       name += 2;
 75
 76       for (ep = __environ; *ep != NULL; ++ep)
 77   {
 78 #if _STRING_ARCH_unaligned
 79     uint16_t ep_start = *(uint16_t *) *ep;
 80 #else
 81     uint16_t ep_start = (((unsigned char *) *ep)[0]
 82              | (((unsigned char *) *ep)[1] << 8));
 83 #endif
 84
 85     if (name_start == ep_start && !strncmp (*ep + 2, name, len)
 86         && (*ep)[len + 2] == '=')
 87       return &(*ep)[len + 3];
 88   }
 89     }
 90
 91   return NULL;
 92 }
 93 libc_hidden_def (getenv)
{code}

Sure enough, at line 85 we are attempting to read {{ep_start}}, which is pointing to a place
in memory somewhere in the array pointed to by the global {{__environ}}. When we create a
subprocess, we pass {{char** envp}} directly from the parent process into the cloned process,
and then temporarily reassign the child process's {{environ}} pointer while we perform {{execvp}}:
{code}
inline int execvpe(const char* file, char** argv, char** envp)
{
  char** saved = os::raw::environment();

  *os::raw::environmentp() = envp;

  int result = execvp(file, argv);

  *os::raw::environmentp() = saved;

  return result;
}
{code}

> os::getenv() can segfault
> -------------------------
>
>                 Key: MESOS-6985
>                 URL: https://issues.apache.org/jira/browse/MESOS-6985
>             Project: Mesos
>          Issue Type: Bug
>          Components: stout
>         Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without libevent/SSL
>            Reporter: Greg Mann
>              Labels: stout
>         Attachments: MasterMaintenanceTest.InverseOffersFilters-truncated.txt, MasterTest.MultipleExecutors.txt
>
>
> This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 and has been
produced by the tests {{MasterTest.MultipleExecutors}} and {{MasterMaintenanceTest.InverseOffersFilters}}.
In both cases, {{os::getenv()}} segfaults with the same stack trace:
> {code}
> *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are using GNU
date ***
> PC: @     0x2ad59e3ae82d (unknown)
> I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0
> *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; stack trace:
***
> I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: executor(75)@172.17.0.2:45752
with pid 28591
>     @     0x2ad5ab953197 (unknown)
>     @     0x2ad5ab957479 (unknown)
>     @     0x2ad59e165330 (unknown)
>     @     0x2ad59e3ae82d (unknown)
>     @     0x2ad594631358 os::getenv()
>     @     0x2ad59aba6acf mesos::internal::slave::executorEnvironment()
>     @     0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor()
>     @     0x2ad59ab818a2 mesos::internal::slave::Slave::_run()
>     @     0x2ad59ac1ec10 _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_
>     @     0x2ad59ac1e6bf _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x2ad59bce2304 std::function<>::operator()()
>     @     0x2ad59bcc9824 process::ProcessBase::visit()
>     @     0x2ad59bd4028e process::DispatchEvent::visit()
>     @     0x2ad594616df1 process::ProcessBase::serve()
>     @     0x2ad59bcc72b7 process::ProcessManager::resume()
>     @     0x2ad59bcd567c process::ProcessManager::init_threads()::$_2::operator()()
>     @     0x2ad59bcd5585 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x2ad59bcd5555 std::_Bind_simple<>::operator()()
>     @     0x2ad59bcd552c std::thread::_Impl<>::_M_run()
>     @     0x2ad59d9e6a60 (unknown)
>     @     0x2ad59e15d184 start_thread
>     @     0x2ad59e46d37d (unknown)
> make[4]: *** [check-local] Segmentation fault
> {code}
> Find attached the full log from a failed run of {{MasterTest.MultipleExecutors}} and
a truncated log from a failed run of {{MasterMaintenanceTest.InverseOffersFilters}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message