mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Xu (JIRA)" <j...@apache.org>
Subject [jira] [Deleted] (MESOS-6564) Running tasks in Mesos tests requires > 32MB mem
Date Wed, 09 Nov 2016 01:45:58 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yan Xu deleted MESOS-6564:
--------------------------


> Running tasks in Mesos tests requires > 32MB mem
> ------------------------------------------------
>
>                 Key: MESOS-6564
>                 URL: https://issues.apache.org/jira/browse/MESOS-6564
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>
> As we put more long running components (e.g., {{mesos-containerizer launch}}) into the
container it results in larger memory requirement in the container.
> Meanwhile, some tests only specify {{mem:32}} and they OOMed during my make check run.
One example:
> {noformat:title=SlaveRecoveryTest/0.ROOT_CGROUPS_ReconnectDefaultExecutor}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer
> [ RUN      ] SlaveRecoveryTest/0.ROOT_CGROUPS_ReconnectDefaultExecutor
> I1108 17:31:03.343797 1411299 cgroups.cpp:2726] Freezing cgroup /cgroup/freezer/mesos_test_d0258bb2-aae1-4c5b-b3b1-399b23d99388
> I1108 17:31:03.346014 1411287 cgroups.cpp:1439] Successfully froze cgroup /cgroup/freezer/mesos_test_d0258bb2-aae1-4c5b-b3b1-399b23d99388
after 2.109184ms
> I1108 17:31:03.348575 1411297 cgroups.cpp:2744] Thawing cgroup /cgroup/freezer/mesos_test_d0258bb2-aae1-4c5b-b3b1-399b23d99388
> I1108 17:31:03.350499 1411309 cgroups.cpp:1468] Successfully thawed cgroup /cgroup/freezer/mesos_test_d0258bb2-aae1-4c5b-b3b1-399b23d99388
after 1.845248ms
> I1108 17:31:03.366251 1411265 cluster.cpp:158] Creating default 'local' authorizer
> I1108 17:31:03.369767 1411265 replica.cpp:776] Replica recovered with log positions 0
-> 0 with 1 holes and 0 unlearned
> I1108 17:31:03.371731 1411283 recover.cpp:451] Starting replica recovery
> I1108 17:31:03.372555 1411298 recover.cpp:477] Replica is in EMPTY status
> I1108 17:31:03.374980 1411282 replica.cpp:673] Replica in EMPTY status received a broadcasted
recover request from __req_res__(1)@<ip>:44689
> I1108 17:31:03.375905 1411309 master.cpp:380] Master c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca
(batch002.usspk02.pie.apple.com) started on <ip>:44689
> I1108 17:31:03.376400 1411296 recover.cpp:197] Received a recover response from a replica
in EMPTY status
> I1108 17:31:03.377233 1411298 recover.cpp:568] Updating replica status to STARTING
> I1108 17:31:03.377995 1411297 replica.cpp:320] Persisted replica status to STARTING
> I1108 17:31:03.378324 1411295 recover.cpp:477] Replica is in STARTING status
> I1108 17:31:03.375933 1411309 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs"
--agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true"
--authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5"
--authorizers="local" --credentials="/tmp/WsaXcw/credentials" --framework_sorter="drf" --help="false"
--hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic"
--initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000"
--quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400"
--registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf"
--version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/WsaXcw/master"
--zk_session_timeout="10secs"
> I1108 17:31:03.378505 1411309 master.cpp:432] Master only allowing authenticated frameworks
to register
> I1108 17:31:03.378527 1411309 master.cpp:446] Master only allowing authenticated agents
to register
> I1108 17:31:03.378540 1411309 master.cpp:459] Master only allowing authenticated HTTP
frameworks to register
> I1108 17:31:03.378554 1411309 credentials.hpp:37] Loading credentials for authentication
from '/tmp/WsaXcw/credentials'
> I1108 17:31:03.379101 1411297 replica.cpp:673] Replica in STARTING status received a
broadcasted recover request from __req_res__(2)@<ip>:44689
> I1108 17:31:03.379477 1411309 master.cpp:504] Using default 'crammd5' authenticator
> I1108 17:31:03.379565 1411308 recover.cpp:197] Received a recover response from a replica
in STARTING status
> I1108 17:31:03.379755 1411309 authenticator.cpp:519] Initializing server SASL
> I1108 17:31:03.380175 1411285 recover.cpp:568] Updating replica status to VOTING
> I1108 17:31:03.380550 1411296 replica.cpp:320] Persisted replica status to VOTING
> I1108 17:31:03.380756 1411292 recover.cpp:582] Successfully joined the Paxos group
> I1108 17:31:03.381338 1411309 http.cpp:887] Using default 'basic' HTTP authenticator
for realm 'mesos-master-readonly'
> I1108 17:31:03.381695 1411309 http.cpp:887] Using default 'basic' HTTP authenticator
for realm 'mesos-master-readwrite'
> I1108 17:31:03.381886 1411309 http.cpp:887] Using default 'basic' HTTP authenticator
for realm 'mesos-master-scheduler'
> I1108 17:31:03.382094 1411309 master.cpp:584] Authorization enabled
> I1108 17:31:03.393326 1411289 master.cpp:2033] Elected as the leading master!
> I1108 17:31:03.393400 1411289 master.cpp:1560] Recovering from registrar
> I1108 17:31:03.395020 1411298 log.cpp:553] Attempting to start the writer
> I1108 17:31:03.397505 1411304 replica.cpp:493] Replica received implicit promise request
from __req_res__(3)@<ip>:44689 with proposal 1
> I1108 17:31:03.397716 1411304 replica.cpp:342] Persisted promised to 1
> I1108 17:31:03.398790 1411301 coordinator.cpp:238] Coordinator attempting to fill missing
positions
> I1108 17:31:03.400727 1411310 replica.cpp:388] Replica received explicit promise request
from __req_res__(4)@<ip>:44689 for position 0 with proposal 2
> I1108 17:31:03.402539 1411290 replica.cpp:537] Replica received write request for position
0 from __req_res__(5)@<ip>:44689
> I1108 17:31:03.403987 1411293 replica.cpp:691] Replica received learned notice for position
0 from @0.0.0.0:0
> I1108 17:31:03.404970 1411308 log.cpp:569] Writer started with ending position 0
> I1108 17:31:03.411018 1411305 registrar.cpp:362] Successfully fetched the registry (0B)
in 17.332992ms
> I1108 17:31:03.411242 1411305 registrar.cpp:461] Applied 1 operations in 40745ns; attempting
to update the registry
> I1108 17:31:03.414239 1411300 coordinator.cpp:348] Coordinator attempting to write APPEND
action at position 1
> I1108 17:31:03.415271 1411290 replica.cpp:537] Replica received write request for position
1 from __req_res__(6)@<ip>:44689
> I1108 17:31:03.416229 1411307 replica.cpp:691] Replica received learned notice for position
1 from @0.0.0.0:0
> I1108 17:31:03.417839 1411298 registrar.cpp:506] Successfully updated the registry in
6.471936ms
> I1108 17:31:03.418028 1411298 registrar.cpp:392] Successfully recovered registrar
> I1108 17:31:03.418296 1411297 coordinator.cpp:348] Coordinator attempting to write TRUNCATE
action at position 2
> I1108 17:31:03.418588 1411308 master.cpp:1676] Recovered 0 agents from the registry (176B);
allowing 10mins for agents to re-register
> I1108 17:31:03.419508 1411284 replica.cpp:537] Replica received write request for position
2 from __req_res__(7)@<ip>:44689
> I1108 17:31:03.420653 1411303 replica.cpp:691] Replica received learned notice for position
2 from @0.0.0.0:0
> I1108 17:31:03.424302 1411265 containerizer.cpp:201] Using isolation: cgroups/cpu,cgroups/mem,filesystem/posix,network/cni
> I1108 17:31:03.428274 1411265 linux_launcher.cpp:150] Using /cgroup/freezer as the freezer
hierarchy for the Linux launcher
> I1108 17:31:03.447060 1411265 cluster.cpp:435] Creating default 'local' authorizer
> I1108 17:31:03.448750 1411303 slave.cpp:208] Mesos agent started on @<ip>:44689
> I1108 17:31:03.448770 1411303 slave.cpp:209] Flags at startup: --acls="" --agent_subsystems="memory,cpuacct"
--appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true"
--authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs"
--authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false"
--cgroups_hierarchy="/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9"
--container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq/credential"
--default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true"
--docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock"
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume"
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs"
--fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq/fetch"
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
--hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false"
--http_credentials="/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq/http_credentials"
--image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="cgroups/cpu,cgroups/mem"
--launcher="linux" --launcher_dir="<workspace>/mesos/build/src" --logbufsecs="0" --logging_level="INFO"
--max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs"
--perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]"
--revocable_cpu_low_priority="true" --runtime_dir="/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq"
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true"
--systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_xeeott"
> I1108 17:31:03.449337 1411303 slave.cpp:229] Moving agent process into its own cgroup
for subsystem: memory
> I1108 17:31:03.449960 1411265 scheduler.cpp:176] Version: 1.2.0
> I1108 17:31:03.450623 1411300 scheduler.cpp:469] New master detected at master@<ip>:44689
> I1108 17:31:03.458760 1411303 slave.cpp:229] Moving agent process into its own cgroup
for subsystem: cpuacct
> I1108 17:31:03.465198 1411303 credentials.hpp:86] Loading credential for authentication
from '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq/credential'
> I1108 17:31:03.465389 1411303 slave.cpp:346] Agent using credential for: test-principal
> I1108 17:31:03.465414 1411303 credentials.hpp:37] Loading credentials for authentication
from '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_qzTPJq/http_credentials'
> I1108 17:31:03.465651 1411303 http.cpp:887] Using default 'basic' HTTP authenticator
for realm 'mesos-agent-readonly'
> I1108 17:31:03.466915 1411303 slave.cpp:533] Agent resources: cpus(*):2; mem(*):1024;
disk(*):1024; ports(*):[31000-32000]
> I1108 17:31:03.467031 1411303 slave.cpp:541] Agent attributes: [  ]
> I1108 17:31:03.467053 1411303 slave.cpp:546] Agent hostname: batch002.usspk02.pie.apple.com
> I1108 17:31:03.470763 1411286 state.cpp:57] Recovering state from '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_xeeott/meta'
> I1108 17:31:03.471282 1411285 status_update_manager.cpp:203] Recovering status update
manager
> I1108 17:31:03.471503 1411306 containerizer.cpp:557] Recovering containerizer
> I1108 17:31:03.476634 1411294 provisioner.cpp:253] Provisioner recovery complete
> I1108 17:31:03.477052 1411285 slave.cpp:5399] Finished recovery
> I1108 17:31:03.477795 1411298 status_update_manager.cpp:177] Pausing sending status updates
> I1108 17:31:03.477824 1411285 slave.cpp:915] New master detected at master@<ip>:44689
> I1108 17:31:03.477849 1411285 slave.cpp:974] Authenticating with master master@<ip>:44689
> I1108 17:31:03.477915 1411285 slave.cpp:985] Using default CRAM-MD5 authenticatee
> I1108 17:31:03.478124 1411285 slave.cpp:947] Detecting new master
> I1108 17:31:03.478376 1411311 authenticatee.cpp:97] Initializing client SASL
> I1108 17:31:03.478543 1411311 authenticatee.cpp:121] Creating new client SASL connection
> I1108 17:31:03.478826 1411289 master.cpp:6745] Authenticating agent@<ip>:44689
> I1108 17:31:03.479334 1411284 authenticator.cpp:98] Creating new server SASL connection
> I1108 17:31:03.479547 1411311 authenticatee.cpp:213] Received SASL authentication mechanisms:
CRAM-MD5
> I1108 17:31:03.479574 1411311 authenticatee.cpp:239] Attempting to authenticate with
mechanism 'CRAM-MD5'
> I1108 17:31:03.479742 1411298 authenticator.cpp:204] Received SASL authentication start
> I1108 17:31:03.479820 1411298 authenticator.cpp:326] Authentication requires more steps
> I1108 17:31:03.479938 1411288 authenticatee.cpp:259] Received SASL authentication step
> I1108 17:31:03.480058 1411286 authenticator.cpp:232] Received SASL authentication step
> I1108 17:31:03.480180 1411286 authenticator.cpp:318] Authentication success
> I1108 17:31:03.480278 1411302 authenticatee.cpp:299] Authentication success
> I1108 17:31:03.480448 1411306 master.cpp:6775] Successfully authenticated principal 'test-principal'
at agent@<ip>:44689
> I1108 17:31:03.480928 1411283 slave.cpp:1069] Successfully authenticated with master
master@<ip>:44689
> I1108 17:31:03.481401 1411311 master.cpp:5154] Registering agent at agent@<ip>:44689
(batch002.usspk02.pie.apple.com) with id c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
> I1108 17:31:03.481951 1411290 registrar.cpp:461] Applied 1 operations in 49746ns; attempting
to update the registry
> I1108 17:31:03.483994 1411287 coordinator.cpp:348] Coordinator attempting to write APPEND
action at position 3
> I1108 17:31:03.484766 1411300 replica.cpp:537] Replica received write request for position
3 from __req_res__(8)@<ip>:44689
> I1108 17:31:03.485657 1411294 replica.cpp:691] Replica received learned notice for position
3 from @0.0.0.0:0
> I1108 17:31:03.486795 1411301 registrar.cpp:506] Successfully updated the registry in
4.77696ms
> I1108 17:31:03.487300 1411292 coordinator.cpp:348] Coordinator attempting to write TRUNCATE
action at position 4
> I1108 17:31:03.487941 1411309 master.cpp:5225] Registered agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
at agent@<ip>:44689 (batch002.usspk02.pie.apple.com) with cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
> I1108 17:31:03.488003 1411289 slave.cpp:1115] Registered with master master@<ip>:44689;
given agent ID c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
> I1108 17:31:03.488147 1411296 status_update_manager.cpp:184] Resuming sending status
updates
> I1108 17:31:03.488363 1411280 replica.cpp:537] Replica received write request for position
4 from __req_res__(9)@<ip>:44689
> I1108 17:31:03.488416 1411289 slave.cpp:1175] Forwarding total oversubscribed resources
{}
> I1108 17:31:03.488423 1411298 hierarchical.cpp:485] Added agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
(batch002.usspk02.pie.apple.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
(allocated: {})
> I1108 17:31:03.488595 1411301 master.cpp:5624] Received update of agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
at agent@<ip>:44689 (batch002.usspk02.pie.apple.com) with total oversubscribed resources
{}
> I1108 17:31:03.488963 1411298 hierarchical.cpp:555] Agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
(batch002.usspk02.pie.apple.com) updated with oversubscribed resources {} (total: cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {})
> I1108 17:31:03.489298 1411281 replica.cpp:691] Replica received learned notice for position
4 from @0.0.0.0:0
> I1108 17:31:03.825922 1411292 http.cpp:391] HTTP POST for /master/api/v1/scheduler from
<ip>:60315
> I1108 17:31:03.828692 1411292 master.cpp:2329] Received subscription request for HTTP
framework 'default'
> I1108 17:31:03.828912 1411292 master.cpp:2069] Authorizing framework principal 'test-principal'
to receive offers for role '*'
> I1108 17:31:03.829923 1411304 master.cpp:2427] Subscribing framework 'default' with checkpointing
enabled and capabilities [  ]
> I1108 17:31:03.831852 1411282 hierarchical.cpp:275] Added framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:03.834179 1411289 master.cpp:6574] Sending 1 offers to framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
(default)
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.v1.executor.Call"
because it is missing required fields: framework_id, subscribe.unacknowledged_tasks[0].task_id,
subscribe.unacknowledged_tasks[0].agent_id, subscribe.unacknowledged_updates[0].status
> [libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.v1.executor.Call"
because it is missing required fields: framework_id, subscribe.unacknowledged_tasks[0].task_id,
subscribe.unacknowledged_tasks[0].agent_id, subscribe.unacknowledged_updates[0].status
> I1108 17:31:03.852727 1411304 http.cpp:391] HTTP POST for /master/api/v1/scheduler from
<ip>:60314
> I1108 17:31:03.854548 1411304 master.cpp:3581] Processing ACCEPT call for offers: [ c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-O0
] on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
for framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 (default)
> I1108 17:31:03.854723 1411304 master.cpp:3173] Authorizing framework principal 'test-principal'
to launch task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
> I1108 17:31:03.855190 1411304 master.cpp:3173] Authorizing framework principal 'test-principal'
to launch task 7c4d5c66-d8a0-4610-9725-f10e84729444
> I1108 17:31:03.859946 1411304 master.cpp:8337] Adding task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
with resources cpus(*):0.1; mem(*):32; disk(*):32 on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
(batch002.usspk02.pie.apple.com)
> I1108 17:31:03.860445 1411304 master.cpp:8337] Adding task 7c4d5c66-d8a0-4610-9725-f10e84729444
with resources cpus(*):0.1; mem(*):32; disk(*):32 on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
(batch002.usspk02.pie.apple.com)
> I1108 17:31:03.860625 1411304 master.cpp:4438] Launching task group { 7c4d5c66-d8a0-4610-9725-f10e84729444,
8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 } of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
(default) with resources cpus(*):0.2; mem(*):64; disk(*):64 on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0
at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> I1108 17:31:03.861354 1411293 slave.cpp:1547] Got assigned task group containing tasks
[ 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142, 7c4d5c66-d8a0-4610-9725-f10e84729444 ] for framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:03.863292 1411293 slave.cpp:1709] Launching task group containing tasks [
8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142, 7c4d5c66-d8a0-4610-9725-f10e84729444 ] for framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:03.866000 1411293 paths.cpp:530] Trying to chown '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_xeeott/slaves/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0/frameworks/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000/executors/default/runs/729ab77b-6634-4cd4-9a5b-143ef8611a1a'
to user 'root'
> I1108 17:31:03.878134 1411293 slave.cpp:6307] Launching executor 'default' of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 with resources cpus(*):0.1; mem(*):32; disk(*):32
in work directory '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_xeeott/slaves/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0/frameworks/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000/executors/default/runs/729ab77b-6634-4cd4-9a5b-143ef8611a1a'
> I1108 17:31:03.879158 1411308 containerizer.cpp:940] Starting container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
for executor 'default' of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:03.880475 1411293 slave.cpp:2031] Queued task group containing tasks [ 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142,
7c4d5c66-d8a0-4610-9725-f10e84729444 ] for executor 'default' of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:03.887890 1411288 memory.cpp:451] Started listening for OOM events for container
729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.888701 1411288 memory.cpp:562] Started listening on 'low' memory pressure
events for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.889335 1411288 memory.cpp:562] Started listening on 'medium' memory pressure
events for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.889950 1411288 memory.cpp:562] Started listening on 'critical' memory
pressure events for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.891623 1411288 memory.cpp:199] Updated 'memory.soft_limit_in_bytes' to
32MB for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.892608 1411288 memory.cpp:251] Updated 'memory.limit_in_bytes' to 32MB
for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.893195 1411288 cpu.cpp:101] Updated 'cpu.shares' to 102 (cpus 0.1) for
container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:03.897088 1411286 linux_launcher.cpp:421] Launching container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
and cloning with namespaces 
> I1108 17:31:03.901765 1411292 containerizer.cpp:1517] Checkpointing container's forked
pid 1411344 to '/tmp/SlaveRecoveryTest_0_ROOT_CGROUPS_ReconnectDefaultExecutor_xeeott/meta/slaves/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0/frameworks/c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000/executors/default/runs/729ab77b-6634-4cd4-9a5b-143ef8611a1a/pids/forked.pid'
> I1108 17:31:04.235466 1411305 memory.cpp:488] OOM detected for container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.237541 1411305 memory.cpp:528] Memory limit exceeded: Requested: 32MB
Maximum Used: 32MB
> MEMORY STATISTICS: 
> cache 0
> rss 33554432
> rss_huge 0
> mapped_file 0
> writeback 0
> swap 0
> pgpgin 9922
> pgpgout 1730
> pgfault 13423
> pgmajfault 0
> inactive_anon 0
> active_anon 33005568
> inactive_file 0
> active_file 0
> unevictable 0
> hierarchical_memory_limit 33554432
> hierarchical_memsw_limit 9223372036854771712
> total_cache 0
> total_rss 33554432
> total_rss_huge 0
> total_mapped_file 0
> total_writeback 0
> total_swap 0
> total_pgpgin 9922
> total_pgpgout 1730
> total_pgfault 13423
> total_pgmajfault 0
> total_inactive_anon 0
> total_active_anon 33005568
> total_inactive_file 0
> total_active_file 0
> total_unevictable 0
> I1108 17:31:04.238314 1411291 containerizer.cpp:2358] Container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
has reached its limit for resource mem(*):32 and will be terminated
> I1108 17:31:04.238451 1411291 containerizer.cpp:1978] Destroying container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
in RUNNING state
> I1108 17:31:04.239295 1411293 linux_launcher.cpp:498] Asked to destroy container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.239933 1411293 linux_launcher.cpp:541] Using freezer to destroy cgroup
mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9/729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.241255 1411303 cgroups.cpp:2726] Freezing cgroup /cgroup/freezer/mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9/729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.546821 1411305 cgroups.cpp:1439] Successfully froze cgroup /cgroup/freezer/mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9/729ab77b-6634-4cd4-9a5b-143ef8611a1a
after 305.496064ms
> I1108 17:31:04.548313 1411284 cgroups.cpp:2744] Thawing cgroup /cgroup/freezer/mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9/729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.550299 1411310 cgroups.cpp:1468] Successfully thawed cgroup /cgroup/freezer/mesos_test_347c5d8d-e8f7-4c83-9ab3-a8cc331d78f9/729ab77b-6634-4cd4-9a5b-143ef8611a1a
after 1.934848ms
> I1108 17:31:04.579886 1411311 containerizer.cpp:2341] Container 729ab77b-6634-4cd4-9a5b-143ef8611a1a
has exited
> I1108 17:31:04.594172 1411294 slave.cpp:4660] Executor 'default' of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
terminated with signal Killed
> I1108 17:31:04.596371 1411294 slave.cpp:3740] Handling status update TASK_FAILED (UUID:
fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 from @0.0.0.0:0
> I1108 17:31:04.597141 1411294 slave.cpp:3740] Handling status update TASK_FAILED (UUID:
456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 from @0.0.0.0:0
> I1108 17:31:04.597650 1411281 master.cpp:5884] Executor 'default' of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com):
terminated with signal Killed
> I1108 17:31:04.597705 1411281 master.cpp:7840] Removing executor 'default' with resources
cpus(*):0.1; mem(*):32; disk(*):32 of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
on agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> W1108 17:31:04.598186 1411290 containerizer.cpp:1788] Ignoring update for unknown container
729ab77b-6634-4cd4-9a5b-143ef8611a1a
> W1108 17:31:04.598666 1411286 containerizer.cpp:1788] Ignoring update for unknown container
729ab77b-6634-4cd4-9a5b-143ef8611a1a
> I1108 17:31:04.599019 1411300 status_update_manager.cpp:323] Received status update TASK_FAILED
(UUID: fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:04.599648 1411300 status_update_manager.cpp:832] Checkpointing UPDATE for
status update TASK_FAILED (UUID: fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: failure(0x7ffef2571180, @0x7f06ec004e10 48-byte object <10-94 8E-38
07-7F 00-00 00-00 00-00 00-00 00-00 07-00 00-00 00-00 00-00 70-39 00-EC 06-7F 00-00 A0-5C
00-EC 06-7F 00-00 09-00 00-00 00-00 00-00>)
> Stack trace:
> I1108 17:31:04.600095 1411302 slave.cpp:4169] Forwarding the update TASK_FAILED (UUID:
fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 to master@<ip>:44689
> I1108 17:31:04.600188 1411300 status_update_manager.cpp:323] Received status update TASK_FAILED
(UUID: 456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:04.600491 1411300 status_update_manager.cpp:832] Checkpointing UPDATE for
status update TASK_FAILED (UUID: 456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:04.600514 1411291 master.cpp:5760] Status update TASK_FAILED (UUID: fbd90f4a-80f4-4a22-90d6-1b9d8e16be66)
for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
from agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> I1108 17:31:04.600574 1411291 master.cpp:5822] Forwarding status update TASK_FAILED (UUID:
fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:04.600740 1411307 slave.cpp:4169] Forwarding the update TASK_FAILED (UUID:
456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 to master@<ip>:44689
> I1108 17:31:04.601114 1411291 master.cpp:7715] Updating the state of task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> I1108 17:31:04.602102 1411291 master.cpp:5760] Status update TASK_FAILED (UUID: 456b0e51-865c-4350-8a94-895dc25ddc83)
for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
from agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> I1108 17:31:04.602162 1411291 master.cpp:5822] Forwarding status update TASK_FAILED (UUID:
456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: update(0x7ffef2571180, @0x7f07000044e0 32-byte object <90-95 8E-38
07-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 50-6D 01-00 07-7F 00-00>)
> Stack trace:
> I1108 17:31:04.602488 1411291 master.cpp:7715] Updating the state of task 7c4d5c66-d8a0-4610-9725-f10e84729444
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: update(0x7ffef2571180, @0x7f07080020c0 32-byte object <90-95 8E-38
07-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 50-3F 00-08 07-7F 00-00>)
> Stack trace:
> I1108 17:31:05.385900 1411289 master.cpp:6574] Sending 1 offers to framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
(default)
> W1108 17:31:14.601280 1411292 status_update_manager.cpp:478] Resending status update
TASK_FAILED (UUID: 456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> W1108 17:31:14.601706 1411292 status_update_manager.cpp:478] Resending status update
TASK_FAILED (UUID: fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:14.601807 1411299 slave.cpp:4169] Forwarding the update TASK_FAILED (UUID:
456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 to master@<ip>:44689
> I1108 17:31:14.602236 1411299 slave.cpp:4169] Forwarding the update TASK_FAILED (UUID:
fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 to master@<ip>:44689
> I1108 17:31:14.602450 1411306 master.cpp:5760] Status update TASK_FAILED (UUID: 456b0e51-865c-4350-8a94-895dc25ddc83)
for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
from agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> I1108 17:31:14.602545 1411306 master.cpp:5822] Forwarding status update TASK_FAILED (UUID:
456b0e51-865c-4350-8a94-895dc25ddc83) for task 7c4d5c66-d8a0-4610-9725-f10e84729444 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:14.603198 1411306 master.cpp:7715] Updating the state of task 7c4d5c66-d8a0-4610-9725-f10e84729444
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> I1108 17:31:14.603487 1411306 master.cpp:5760] Status update TASK_FAILED (UUID: fbd90f4a-80f4-4a22-90d6-1b9d8e16be66)
for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
from agent c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-S0 at agent@<ip>:44689 (batch002.usspk02.pie.apple.com)
> I1108 17:31:14.603564 1411306 master.cpp:5822] Forwarding status update TASK_FAILED (UUID:
fbd90f4a-80f4-4a22-90d6-1b9d8e16be66) for task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142 of framework
c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000
> I1108 17:31:14.603998 1411306 master.cpp:7715] Updating the state of task 8d0a55d7-95d7-4fbc-a2cd-7883dc5b2142
of framework c950f7d7-4efa-4ed5-83d6-eeaeff82e9ca-0000 (latest state: TASK_FAILED, status
update state: TASK_FAILED)
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: update(0x7ffef2571180, @0x7f06ec004ab0 32-byte object <90-95 8E-38
07-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 C0-4D 00-EC 06-7F 00-00>)
> Stack trace:
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: update(0x7ffef2571180, @0x7f0694000fb0 32-byte object <90-95 8E-38
07-7F 00-00 00-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 90-2F 00-94 06-7F 00-00>)
> Stack trace:
> ../../src/tests/slave_recovery_tests.cpp:665: Failure
> Failed to wait 15secs for updateCall1
> *** Aborted at 1478626278 (unix time) try "date -d @1478626278" if you are using GNU
date ***
> PC: @          0x1b3d7d8 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 1411265 (TID 0x7f0738cca840) from PID 0; stack trace:
***
>     @     0x7f07308d17e0 (unknown)
>     @          0x1b3d7d8 testing::UnitTest::AddTestPartResult()
>     @          0x1b32211 testing::internal::AssertHelper::operator=()
>     @          0x1728a8c mesos::internal::tests::SlaveRecoveryTest_ROOT_CGROUPS_ReconnectDefaultExecutor_Test<>::TestBody()
>     @          0x1b5afbc testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1b561ba testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1b375b1 testing::Test::Run()
>     @          0x1b37d3f testing::TestInfo::Run()
>     @          0x1b3837a testing::TestCase::Run()
>     @          0x1b3ecc9 testing::internal::UnitTestImpl::RunAllTests()
>     @          0x1b5bc4b testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1b56d10 testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1b3d9f9 testing::UnitTest::Run()
>     @          0x114aacd RUN_ALL_TESTS()
>     @          0x114a69c main
>     @     0x7f072f98ed5d __libc_start_main
>     @           0xa82d69 (unknown)
> {noformat}
> We should double the mem requirement at least.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message