mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6521) MasterTest.RecoverResourcesOrphanedTask is flaky
Date Mon, 31 Oct 2016 23:54:58 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone updated MESOS-6521:
------------------------------
    Description: 
Observed this on ASF CI.

Looks like the acknowledgement of TASK_RUNNING update was never received by the executor. Not sure why. 
{code}
[ RUN      ] MasterTest.RecoverResourcesOrphanedTask
I1031 14:55:42.624336 30664 cluster.cpp:158] Creating default 'local' authorizer
I1031 14:55:42.627467 30664 leveldb.cpp:174] Opened db in 2.723103ms
I1031 14:55:42.628058 30664 leveldb.cpp:181] Compacted db in 560866ns
I1031 14:55:42.628105 30664 leveldb.cpp:196] Created db iterator in 17967ns
I1031 14:55:42.628128 30664 leveldb.cpp:202] Seeked to beginning of db in 2025ns
I1031 14:55:42.628139 30664 leveldb.cpp:271] Iterated through 0 keys in the db in 355ns
I1031 14:55:42.628175 30664 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1031 14:55:42.628563 30696 recover.cpp:451] Starting replica recovery
I1031 14:55:42.629137 30694 recover.cpp:477] Replica is in EMPTY status
I1031 14:55:42.630564 30693 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(3298)@172.17.0.3:45279
I1031 14:55:42.631057 30695 recover.cpp:197] Received a recover response from a replica in EMPTY status
I1031 14:55:42.631556 30694 recover.cpp:568] Updating replica status to STARTING
I1031 14:55:42.631809 30698 master.cpp:380] Master 598309b4-b22f-47d3-9a90-7afc2df0d89f (169c0a15c915) started on 172.17.0.3:45279
I1031 14:55:42.632272 30687 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 423123ns
I1031 14:55:42.631837 30698 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --
authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/oUy0jO/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true"
 --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per
_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="1
00secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" --work_dir="/tmp/oUy0jO/master" --zk_session_timeout="10secs"
I1031 14:55:42.632298 30687 replica.cpp:320] Persisted replica status to STARTING
I1031 14:55:42.632345 30698 master.cpp:432] Master only allowing authenticated frameworks to register
I1031 14:55:42.632359 30698 master.cpp:446] Master only allowing authenticated agents to register
I1031 14:55:42.632367 30698 master.cpp:459] Master only allowing authenticated HTTP frameworks to register
I1031 14:55:42.632375 30698 credentials.hpp:37] Loading credentials for authentication from '/tmp/oUy0jO/credentials'
I1031 14:55:42.632555 30686 recover.cpp:477] Replica is in STARTING status
I1031 14:55:42.632690 30698 master.cpp:504] Using default 'crammd5' authenticator
I1031 14:55:42.632850 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I1031 14:55:42.633018 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I1031 14:55:42.633174 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I1031 14:55:42.633335 30698 master.cpp:584] Authorization enabled
I1031 14:55:42.633500 30697 hierarchical.cpp:149] Initialized hierarchical allocator process
I1031 14:55:42.633513 30683 whitelist_watcher.cpp:77] No whitelist given
I1031 14:55:42.633719 30683 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(3299)@172.17.0.3:45279
I1031 14:55:42.634234 30689 recover.cpp:197] Received a recover response from a replica in STARTING status
I1031 14:55:42.634668 30694 recover.cpp:568] Updating replica status to VOTING
I1031 14:55:42.635254 30697 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 338613ns
I1031 14:55:42.635290 30697 replica.cpp:320] Persisted replica status to VOTING
I1031 14:55:42.635447 30683 recover.cpp:582] Successfully joined the Paxos group
I1031 14:55:42.635709 30683 recover.cpp:466] Recover process terminated
I1031 14:55:42.636854 30697 master.cpp:2033] Elected as the leading master!
I1031 14:55:42.636881 30697 master.cpp:1560] Recovering from registrar
I1031 14:55:42.636993 30685 registrar.cpp:329] Recovering registrar
I1031 14:55:42.637830 30695 log.cpp:553] Attempting to start the writer
I1031 14:55:42.639220 30686 replica.cpp:493] Replica received implicit promise request from __req_res__(3300)@172.17.0.3:45279 with proposal 1
I1031 14:55:42.639647 30686 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 390697ns
I1031 14:55:42.639677 30686 replica.cpp:342] Persisted promised to 1
I1031 14:55:42.640534 30686 coordinator.cpp:238] Coordinator attempting to fill missing positions
I1031 14:55:42.641780 30688 replica.cpp:388] Replica received explicit promise request from __req_res__(3301)@172.17.0.3:45279 for position 0 with proposal 2
I1031 14:55:42.642138 30688 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 287987ns
I1031 14:55:42.642166 30688 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.643610 30698 replica.cpp:537] Replica received write request for position 0 from __req_res__(3302)@172.17.0.3:45279
I1031 14:55:42.643682 30698 leveldb.cpp:436] Reading position from leveldb took 37680ns
I1031 14:55:42.643851 30698 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 126627ns
I1031 14:55:42.643872 30698 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.644600 30684 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
I1031 14:55:42.644754 30684 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 125716ns
I1031 14:55:42.644778 30684 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.645606 30693 log.cpp:569] Writer started with ending position 0
I1031 14:55:42.646730 30696 leveldb.cpp:436] Reading position from leveldb took 36848ns
I1031 14:55:42.647701 30691 registrar.cpp:362] Successfully fetched the registry (0B) in 10.651904ms
I1031 14:55:42.647816 30691 registrar.cpp:461] Applied 1 operations in 17440ns; attempting to update the registry
I1031 14:55:42.648564 30693 log.cpp:577] Attempting to append 168 bytes to the log
I1031 14:55:42.648675 30687 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
I1031 14:55:42.649667 30688 replica.cpp:537] Replica received write request for position 1 from __req_res__(3303)@172.17.0.3:45279
I1031 14:55:42.650169 30688 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 454939ns
I1031 14:55:42.650202 30688 replica.cpp:708] Persisted action APPEND at position 1
I1031 14:55:42.651391 30689 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
I1031 14:55:42.651859 30689 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 430513ns
I1031 14:55:42.651890 30689 replica.cpp:708] Persisted action APPEND at position 1
I1031 14:55:42.653036 30694 registrar.cpp:506] Successfully updated the registry in 5.159168ms
I1031 14:55:42.653187 30694 registrar.cpp:392] Successfully recovered registrar
I1031 14:55:42.653368 30695 log.cpp:596] Attempting to truncate the log to 1
I1031 14:55:42.653604 30688 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
I1031 14:55:42.653679 30698 master.cpp:1676] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register
I1031 14:55:42.653715 30691 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover
I1031 14:55:42.654757 30685 replica.cpp:537] Replica received write request for position 2 from __req_res__(3304)@172.17.0.3:45279
I1031 14:55:42.655082 30685 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 289879ns
I1031 14:55:42.655103 30685 replica.cpp:708] Persisted action TRUNCATE at position 2
I1031 14:55:42.655730 30695 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
I1031 14:55:42.655913 30695 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 157628ns
I1031 14:55:42.655958 30695 leveldb.cpp:399] Deleting ~1 keys from leveldb took 24969ns
I1031 14:55:42.655974 30695 replica.cpp:708] Persisted action TRUNCATE at position 2
I1031 14:55:42.661589 30664 cluster.cpp:435] Creating default 'local' authorizer
I1031 14:55:42.663805 30691 slave.cpp:208] Mesos agent started on (226)@172.17.0.3:45279
I1031 14:55:42.663830 30691 slave.cpp:209] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.2.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi"
I1031 14:55:42.664497 30691 credentials.hpp:86] Loading credential for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential'
I1031 14:55:42.664716 30691 slave.cpp:346] Agent using credential for: test-principal
I1031 14:55:42.664737 30691 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials'
I1031 14:55:42.664749 30664 scheduler.cpp:176] Version: 1.2.0
I1031 14:55:42.665066 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I1031 14:55:42.665258 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
I1031 14:55:42.665680 30695 scheduler.cpp:465] New master detected at master@172.17.0.3:45279
I1031 14:55:42.665711 30695 scheduler.cpp:474] Waiting for 0ns before initiating a re-(connection) attempt with the master
I1031 14:55:42.666929 30691 slave.cpp:533] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1031 14:55:42.667012 30691 slave.cpp:541] Agent attributes: [  ]
I1031 14:55:42.667052 30691 slave.cpp:546] Agent hostname: 169c0a15c915
I1031 14:55:42.667279 30697 scheduler.cpp:353] Connected with the master at http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.668727 30683 scheduler.cpp:235] Sending SUBSCRIBE call to http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.668860 30690 state.cpp:57] Recovering state from '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta'
I1031 14:55:42.669205 30685 status_update_manager.cpp:203] Recovering status update manager
I1031 14:55:42.669450 30687 slave.cpp:5399] Finished recovery
I1031 14:55:42.669751 30692 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
I1031 14:55:42.669924 30687 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:42.670189 30695 status_update_manager.cpp:177] Pausing sending status updates
I1031 14:55:42.670199 30684 slave.cpp:915] New master detected at master@172.17.0.3:45279
I1031 14:55:42.670224 30684 slave.cpp:974] Authenticating with master master@172.17.0.3:45279
I1031 14:55:42.670284 30684 slave.cpp:985] Using default CRAM-MD5 authenticatee
I1031 14:55:42.670425 30684 slave.cpp:947] Detecting new master
I1031 14:55:42.670496 30686 authenticatee.cpp:121] Creating new client SASL connection
I1031 14:55:42.670532 30684 slave.cpp:5587] Received oversubscribable resources {} from the resource estimator
I1031 14:55:42.670723 30692 master.cpp:6742] Authenticating slave(226)@172.17.0.3:45279
I1031 14:55:42.670821 30694 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(514)@172.17.0.3:45279
I1031 14:55:42.670892 30692 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39926
I1031 14:55:42.671062 30689 authenticator.cpp:98] Creating new server SASL connection
I1031 14:55:42.671073 30692 master.cpp:2329] Received subscription request for HTTP framework 'default'
I1031 14:55:42.671124 30692 master.cpp:2069] Authorizing framework principal 'test-principal' to receive offers for role '*'
I1031 14:55:42.671303 30683 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1031 14:55:42.671332 30683 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1031 14:55:42.671437 30688 authenticator.cpp:204] Received SASL authentication start
I1031 14:55:42.671500 30688 authenticator.cpp:326] Authentication requires more steps
I1031 14:55:42.671597 30695 authenticatee.cpp:259] Received SASL authentication step
I1031 14:55:42.671645 30692 master.cpp:2427] Subscribing framework 'default' with checkpointing disabled and capabilities [  ]
I1031 14:55:42.671841 30691 authenticator.cpp:232] Received SASL authentication step
I1031 14:55:42.671878 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I1031 14:55:42.671895 30691 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I1031 14:55:42.671932 30691 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I1031 14:55:42.671957 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I1031 14:55:42.671969 30691 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I1031 14:55:42.671978 30691 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1031 14:55:42.672001 30691 authenticator.cpp:318] Authentication success
I1031 14:55:42.672106 30698 authenticatee.cpp:299] Authentication success
I1031 14:55:42.672196 30688 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.672237 30692 master.cpp:6772] Successfully authenticated principal 'test-principal' at slave(226)@172.17.0.3:45279
I1031 14:55:42.672256 30684 hierarchical.cpp:275] Added framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.672273 30693 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(514)@172.17.0.3:45279
I1031 14:55:42.672332 30684 hierarchical.cpp:1694] No allocations performed
I1031 14:55:42.672360 30684 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.672405 30684 hierarchical.cpp:1286] Performed allocation for 0 agents in 127100ns
I1031 14:55:42.672673 30695 slave.cpp:1069] Successfully authenticated with master master@172.17.0.3:45279
I1031 14:55:42.672861 30695 slave.cpp:1483] Will retry registration in 7.30742ms if necessary
I1031 14:55:42.673007 30690 scheduler.cpp:671] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.673056 30684 master.cpp:5151] Registering agent at slave(226)@172.17.0.3:45279 (169c0a15c915) with id 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
I1031 14:55:42.673661 30686 registrar.cpp:461] Applied 1 operations in 62726ns; attempting to update the registry
I1031 14:55:42.673653 30690 scheduler.cpp:671] Enqueuing event HEARTBEAT received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.674896 30690 log.cpp:577] Attempting to append 337 bytes to the log
I1031 14:55:42.675058 30686 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
I1031 14:55:42.676040 30687 replica.cpp:537] Replica received write request for position 3 from __req_res__(3305)@172.17.0.3:45279
I1031 14:55:42.676291 30687 leveldb.cpp:341] Persisting action (356 bytes) to leveldb took 204345ns
I1031 14:55:42.676321 30687 replica.cpp:708] Persisted action APPEND at position 3
I1031 14:55:42.676970 30688 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
I1031 14:55:42.677459 30688 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 452504ns
I1031 14:55:42.677486 30688 replica.cpp:708] Persisted action APPEND at position 3
I1031 14:55:42.679132 30696 registrar.cpp:506] Successfully updated the registry in 5.394176ms
I1031 14:55:42.679390 30693 log.cpp:596] Attempting to truncate the log to 3
I1031 14:55:42.679533 30696 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
I1031 14:55:42.680057 30687 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
I1031 14:55:42.680407 30687 slave.cpp:1115] Registered with master master@172.17.0.3:45279; given agent ID 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
I1031 14:55:42.680209 30695 master.cpp:5222] Registered agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1031 14:55:42.680435 30687 fetcher.cpp:86] Clearing fetcher cache
I1031 14:55:42.680584 30694 status_update_manager.cpp:184] Resuming sending status updates
I1031 14:55:42.680702 30686 hierarchical.cpp:485] Added agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I1031 14:55:42.680755 30687 slave.cpp:1138] Checkpointing SlaveInfo to '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/slave.info'
I1031 14:55:42.680755 30695 replica.cpp:537] Replica received write request for position 4 from __req_res__(3306)@172.17.0.3:45279
I1031 14:55:42.681140 30695 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 340426ns
I1031 14:55:42.681162 30695 replica.cpp:708] Persisted action TRUNCATE at position 4
I1031 14:55:42.681190 30687 slave.cpp:1175] Forwarding total oversubscribed resources {}
I1031 14:55:42.681368 30695 master.cpp:5621] Received update of agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with total oversubscribed resources {}
I1031 14:55:42.681936 30686 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.682029 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 1.273471ms
I1031 14:55:42.682063 30687 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
I1031 14:55:42.682261 30686 hierarchical.cpp:555] Agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
I1031 14:55:42.682442 30686 hierarchical.cpp:1694] No allocations performed
I1031 14:55:42.682487 30686 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.682512 30687 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 416785ns
I1031 14:55:42.682555 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 225997ns
I1031 14:55:42.682579 30687 leveldb.cpp:399] Deleting ~2 keys from leveldb took 40883ns
I1031 14:55:42.682600 30687 replica.cpp:708] Persisted action TRUNCATE at position 4
I1031 14:55:42.682672 30694 master.cpp:6571] Sending 1 offers to framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
I1031 14:55:42.684432 30687 scheduler.cpp:671] Enqueuing event OFFERS received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.687191 30696 scheduler.cpp:235] Sending ACCEPT call to http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.688298 30687 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
I1031 14:55:42.689232 30686 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39925
I1031 14:55:42.690259 30686 master.cpp:3581] Processing ACCEPT call for offers: [ 598309b4-b22f-47d3-9a90-7afc2df0d89f-O0 ] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
I1031 14:55:42.690366 30686 master.cpp:3173] Authorizing framework principal 'test-principal' to launch task 21145c68-5a97-4620-a4e0-d84c6f3b8870
W1031 14:55:42.692777 30686 validation.cpp:920] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W1031 14:55:42.692811 30686 validation.cpp:932] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I1031 14:55:42.693357 30686 master.cpp:8334] Adding task 21145c68-5a97-4620-a4e0-d84c6f3b8870 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915)
I1031 14:55:42.693691 30686 master.cpp:4230] Launching task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default) with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I1031 14:55:42.694245 30688 slave.cpp:1547] Got assigned task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.694960 30688 slave.cpp:1709] Launching task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.695686 30688 paths.cpp:536] Trying to chown '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503' to user 'mesos'
I1031 14:55:42.702708 30688 slave.cpp:6307] Launching executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 with resources {} in work directory '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
I1031 14:55:42.705585 30688 executor.cpp:189] Version: 1.2.0
I1031 14:55:42.706229 30688 slave.cpp:2031] Queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.706313 30688 slave.cpp:868] Successfully attached file '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
I1031 14:55:42.708365 30687 executor.cpp:390] Connected with the agent
I1031 14:55:42.709179 30698 executor.cpp:291] Sending SUBSCRIBE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.710675 30696 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
I1031 14:55:42.711470 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39927
I1031 14:55:42.711701 30698 slave.cpp:3086] Received Subscribe request for HTTP executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.713021 30698 slave.cpp:2247] Sending queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' to executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (via HTTP)
I1031 14:55:42.714972 30687 executor.cpp:705] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.715925 30688 executor.cpp:705] Enqueuing event LAUNCH received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.717298 30684 executor.cpp:291] Sending UPDATE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.718502 30683 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
I1031 14:55:42.719336 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39928
I1031 14:55:42.719491 30698 slave.cpp:3740] Handling status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720316 30697 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720379 30697 status_update_manager.cpp:500] Creating StatusUpdate stream for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720957 30697 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:42.721329 30695 slave.cpp:4169] Forwarding the update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to master@172.17.0.3:45279
I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I../../src/tests/master_tests.cpp:5412: Failure
Failed to wait 15secs for acknowledged
1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.678186 30689 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
{code}

Interestingly there seems to be some out of order log messages in the output. 14:55:43 -> 14:56:04 -> 14:55:52

{code}
I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I../../src/tests/master_tests.cpp:5412: Failure
Failed to wait 15secs for acknowledged
1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
{code}

  was:
Observed this on ASF CI.

Looks like the acknowledgement of TASK_RUNNING update was never received by the executor. Not sure why. 
{code}
[ RUN      ] MasterTest.RecoverResourcesOrphanedTask
I1031 14:55:42.624336 30664 cluster.cpp:158] Creating default 'local' authorizer
I1031 14:55:42.627467 30664 leveldb.cpp:174] Opened db in 2.723103ms
I1031 14:55:42.628058 30664 leveldb.cpp:181] Compacted db in 560866ns
I1031 14:55:42.628105 30664 leveldb.cpp:196] Created db iterator in 17967ns
I1031 14:55:42.628128 30664 leveldb.cpp:202] Seeked to beginning of db in 2025ns
I1031 14:55:42.628139 30664 leveldb.cpp:271] Iterated through 0 keys in the db in 355ns
I1031 14:55:42.628175 30664 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1031 14:55:42.628563 30696 recover.cpp:451] Starting replica recovery
I1031 14:55:42.629137 30694 recover.cpp:477] Replica is in EMPTY status
I1031 14:55:42.630564 30693 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(3298)@172.17.0.3:45279
I1031 14:55:42.631057 30695 recover.cpp:197] Received a recover response from a replica in EMPTY status
I1031 14:55:42.631556 30694 recover.cpp:568] Updating replica status to STARTING
I1031 14:55:42.631809 30698 master.cpp:380] Master 598309b4-b22f-47d3-9a90-7afc2df0d89f (169c0a15c915) started on 172.17.0.3:45279
I1031 14:55:42.632272 30687 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 423123ns
I1031 14:55:42.631837 30698 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --
authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/oUy0jO/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true"
 --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per
_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="1
00secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" --work_dir="/tmp/oUy0jO/master" --zk_session_timeout="10secs"
I1031 14:55:42.632298 30687 replica.cpp:320] Persisted replica status to STARTING
I1031 14:55:42.632345 30698 master.cpp:432] Master only allowing authenticated frameworks to register
I1031 14:55:42.632359 30698 master.cpp:446] Master only allowing authenticated agents to register
I1031 14:55:42.632367 30698 master.cpp:459] Master only allowing authenticated HTTP frameworks to register
I1031 14:55:42.632375 30698 credentials.hpp:37] Loading credentials for authentication from '/tmp/oUy0jO/credentials'
I1031 14:55:42.632555 30686 recover.cpp:477] Replica is in STARTING status
I1031 14:55:42.632690 30698 master.cpp:504] Using default 'crammd5' authenticator
I1031 14:55:42.632850 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I1031 14:55:42.633018 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I1031 14:55:42.633174 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I1031 14:55:42.633335 30698 master.cpp:584] Authorization enabled
I1031 14:55:42.633500 30697 hierarchical.cpp:149] Initialized hierarchical allocator process
I1031 14:55:42.633513 30683 whitelist_watcher.cpp:77] No whitelist given
I1031 14:55:42.633719 30683 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(3299)@172.17.0.3:45279
I1031 14:55:42.634234 30689 recover.cpp:197] Received a recover response from a replica in STARTING status
I1031 14:55:42.634668 30694 recover.cpp:568] Updating replica status to VOTING
I1031 14:55:42.635254 30697 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 338613ns
I1031 14:55:42.635290 30697 replica.cpp:320] Persisted replica status to VOTING
I1031 14:55:42.635447 30683 recover.cpp:582] Successfully joined the Paxos group
I1031 14:55:42.635709 30683 recover.cpp:466] Recover process terminated
I1031 14:55:42.636854 30697 master.cpp:2033] Elected as the leading master!
I1031 14:55:42.636881 30697 master.cpp:1560] Recovering from registrar
I1031 14:55:42.636993 30685 registrar.cpp:329] Recovering registrar
I1031 14:55:42.637830 30695 log.cpp:553] Attempting to start the writer
I1031 14:55:42.639220 30686 replica.cpp:493] Replica received implicit promise request from __req_res__(3300)@172.17.0.3:45279 with proposal 1
I1031 14:55:42.639647 30686 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 390697ns
I1031 14:55:42.639677 30686 replica.cpp:342] Persisted promised to 1
I1031 14:55:42.640534 30686 coordinator.cpp:238] Coordinator attempting to fill missing positions
I1031 14:55:42.641780 30688 replica.cpp:388] Replica received explicit promise request from __req_res__(3301)@172.17.0.3:45279 for position 0 with proposal 2
I1031 14:55:42.642138 30688 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 287987ns
I1031 14:55:42.642166 30688 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.643610 30698 replica.cpp:537] Replica received write request for position 0 from __req_res__(3302)@172.17.0.3:45279
I1031 14:55:42.643682 30698 leveldb.cpp:436] Reading position from leveldb took 37680ns
I1031 14:55:42.643851 30698 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 126627ns
I1031 14:55:42.643872 30698 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.644600 30684 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
I1031 14:55:42.644754 30684 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 125716ns
I1031 14:55:42.644778 30684 replica.cpp:708] Persisted action NOP at position 0
I1031 14:55:42.645606 30693 log.cpp:569] Writer started with ending position 0
I1031 14:55:42.646730 30696 leveldb.cpp:436] Reading position from leveldb took 36848ns
I1031 14:55:42.647701 30691 registrar.cpp:362] Successfully fetched the registry (0B) in 10.651904ms
I1031 14:55:42.647816 30691 registrar.cpp:461] Applied 1 operations in 17440ns; attempting to update the registry
I1031 14:55:42.648564 30693 log.cpp:577] Attempting to append 168 bytes to the log
I1031 14:55:42.648675 30687 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
I1031 14:55:42.649667 30688 replica.cpp:537] Replica received write request for position 1 from __req_res__(3303)@172.17.0.3:45279
I1031 14:55:42.650169 30688 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 454939ns
I1031 14:55:42.650202 30688 replica.cpp:708] Persisted action APPEND at position 1
I1031 14:55:42.651391 30689 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
I1031 14:55:42.651859 30689 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 430513ns
I1031 14:55:42.651890 30689 replica.cpp:708] Persisted action APPEND at position 1
I1031 14:55:42.653036 30694 registrar.cpp:506] Successfully updated the registry in 5.159168ms
I1031 14:55:42.653187 30694 registrar.cpp:392] Successfully recovered registrar
I1031 14:55:42.653368 30695 log.cpp:596] Attempting to truncate the log to 1
I1031 14:55:42.653604 30688 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
I1031 14:55:42.653679 30698 master.cpp:1676] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register
I1031 14:55:42.653715 30691 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover
I1031 14:55:42.654757 30685 replica.cpp:537] Replica received write request for position 2 from __req_res__(3304)@172.17.0.3:45279
I1031 14:55:42.655082 30685 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 289879ns
I1031 14:55:42.655103 30685 replica.cpp:708] Persisted action TRUNCATE at position 2
I1031 14:55:42.655730 30695 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
I1031 14:55:42.655913 30695 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 157628ns
I1031 14:55:42.655958 30695 leveldb.cpp:399] Deleting ~1 keys from leveldb took 24969ns
I1031 14:55:42.655974 30695 replica.cpp:708] Persisted action TRUNCATE at position 2
I1031 14:55:42.661589 30664 cluster.cpp:435] Creating default 'local' authorizer
I1031 14:55:42.663805 30691 slave.cpp:208] Mesos agent started on (226)@172.17.0.3:45279
I1031 14:55:42.663830 30691 slave.cpp:209] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.2.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi"
I1031 14:55:42.664497 30691 credentials.hpp:86] Loading credential for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential'
I1031 14:55:42.664716 30691 slave.cpp:346] Agent using credential for: test-principal
I1031 14:55:42.664737 30691 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials'
I1031 14:55:42.664749 30664 scheduler.cpp:176] Version: 1.2.0
I1031 14:55:42.665066 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I1031 14:55:42.665258 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
I1031 14:55:42.665680 30695 scheduler.cpp:465] New master detected at master@172.17.0.3:45279
I1031 14:55:42.665711 30695 scheduler.cpp:474] Waiting for 0ns before initiating a re-(connection) attempt with the master
I1031 14:55:42.666929 30691 slave.cpp:533] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1031 14:55:42.667012 30691 slave.cpp:541] Agent attributes: [  ]
I1031 14:55:42.667052 30691 slave.cpp:546] Agent hostname: 169c0a15c915
I1031 14:55:42.667279 30697 scheduler.cpp:353] Connected with the master at http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.668727 30683 scheduler.cpp:235] Sending SUBSCRIBE call to http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.668860 30690 state.cpp:57] Recovering state from '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta'
I1031 14:55:42.669205 30685 status_update_manager.cpp:203] Recovering status update manager
I1031 14:55:42.669450 30687 slave.cpp:5399] Finished recovery
I1031 14:55:42.669751 30692 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
I1031 14:55:42.669924 30687 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:42.670189 30695 status_update_manager.cpp:177] Pausing sending status updates
I1031 14:55:42.670199 30684 slave.cpp:915] New master detected at master@172.17.0.3:45279
I1031 14:55:42.670224 30684 slave.cpp:974] Authenticating with master master@172.17.0.3:45279
I1031 14:55:42.670284 30684 slave.cpp:985] Using default CRAM-MD5 authenticatee
I1031 14:55:42.670425 30684 slave.cpp:947] Detecting new master
I1031 14:55:42.670496 30686 authenticatee.cpp:121] Creating new client SASL connection
I1031 14:55:42.670532 30684 slave.cpp:5587] Received oversubscribable resources {} from the resource estimator
I1031 14:55:42.670723 30692 master.cpp:6742] Authenticating slave(226)@172.17.0.3:45279
I1031 14:55:42.670821 30694 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(514)@172.17.0.3:45279
I1031 14:55:42.670892 30692 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39926
I1031 14:55:42.671062 30689 authenticator.cpp:98] Creating new server SASL connection
I1031 14:55:42.671073 30692 master.cpp:2329] Received subscription request for HTTP framework 'default'
I1031 14:55:42.671124 30692 master.cpp:2069] Authorizing framework principal 'test-principal' to receive offers for role '*'
I1031 14:55:42.671303 30683 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I1031 14:55:42.671332 30683 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I1031 14:55:42.671437 30688 authenticator.cpp:204] Received SASL authentication start
I1031 14:55:42.671500 30688 authenticator.cpp:326] Authentication requires more steps
I1031 14:55:42.671597 30695 authenticatee.cpp:259] Received SASL authentication step
I1031 14:55:42.671645 30692 master.cpp:2427] Subscribing framework 'default' with checkpointing disabled and capabilities [  ]
I1031 14:55:42.671841 30691 authenticator.cpp:232] Received SASL authentication step
I1031 14:55:42.671878 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I1031 14:55:42.671895 30691 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I1031 14:55:42.671932 30691 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I1031 14:55:42.671957 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I1031 14:55:42.671969 30691 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I1031 14:55:42.671978 30691 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1031 14:55:42.672001 30691 authenticator.cpp:318] Authentication success
I1031 14:55:42.672106 30698 authenticatee.cpp:299] Authentication success
I1031 14:55:42.672196 30688 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.672237 30692 master.cpp:6772] Successfully authenticated principal 'test-principal' at slave(226)@172.17.0.3:45279
I1031 14:55:42.672256 30684 hierarchical.cpp:275] Added framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.672273 30693 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(514)@172.17.0.3:45279
I1031 14:55:42.672332 30684 hierarchical.cpp:1694] No allocations performed
I1031 14:55:42.672360 30684 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.672405 30684 hierarchical.cpp:1286] Performed allocation for 0 agents in 127100ns
I1031 14:55:42.672673 30695 slave.cpp:1069] Successfully authenticated with master master@172.17.0.3:45279
I1031 14:55:42.672861 30695 slave.cpp:1483] Will retry registration in 7.30742ms if necessary
I1031 14:55:42.673007 30690 scheduler.cpp:671] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.673056 30684 master.cpp:5151] Registering agent at slave(226)@172.17.0.3:45279 (169c0a15c915) with id 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
I1031 14:55:42.673661 30686 registrar.cpp:461] Applied 1 operations in 62726ns; attempting to update the registry
I1031 14:55:42.673653 30690 scheduler.cpp:671] Enqueuing event HEARTBEAT received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.674896 30690 log.cpp:577] Attempting to append 337 bytes to the log
I1031 14:55:42.675058 30686 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
I1031 14:55:42.676040 30687 replica.cpp:537] Replica received write request for position 3 from __req_res__(3305)@172.17.0.3:45279
I1031 14:55:42.676291 30687 leveldb.cpp:341] Persisting action (356 bytes) to leveldb took 204345ns
I1031 14:55:42.676321 30687 replica.cpp:708] Persisted action APPEND at position 3
I1031 14:55:42.676970 30688 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
I1031 14:55:42.677459 30688 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 452504ns
I1031 14:55:42.677486 30688 replica.cpp:708] Persisted action APPEND at position 3
I1031 14:55:42.679132 30696 registrar.cpp:506] Successfully updated the registry in 5.394176ms
I1031 14:55:42.679390 30693 log.cpp:596] Attempting to truncate the log to 3
I1031 14:55:42.679533 30696 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
I1031 14:55:42.680057 30687 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
I1031 14:55:42.680407 30687 slave.cpp:1115] Registered with master master@172.17.0.3:45279; given agent ID 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
I1031 14:55:42.680209 30695 master.cpp:5222] Registered agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I1031 14:55:42.680435 30687 fetcher.cpp:86] Clearing fetcher cache
I1031 14:55:42.680584 30694 status_update_manager.cpp:184] Resuming sending status updates
I1031 14:55:42.680702 30686 hierarchical.cpp:485] Added agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I1031 14:55:42.680755 30687 slave.cpp:1138] Checkpointing SlaveInfo to '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/slave.info'
I1031 14:55:42.680755 30695 replica.cpp:537] Replica received write request for position 4 from __req_res__(3306)@172.17.0.3:45279
I1031 14:55:42.681140 30695 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 340426ns
I1031 14:55:42.681162 30695 replica.cpp:708] Persisted action TRUNCATE at position 4
I1031 14:55:42.681190 30687 slave.cpp:1175] Forwarding total oversubscribed resources {}
I1031 14:55:42.681368 30695 master.cpp:5621] Received update of agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with total oversubscribed resources {}
I1031 14:55:42.681936 30686 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.682029 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 1.273471ms
I1031 14:55:42.682063 30687 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
I1031 14:55:42.682261 30686 hierarchical.cpp:555] Agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
I1031 14:55:42.682442 30686 hierarchical.cpp:1694] No allocations performed
I1031 14:55:42.682487 30686 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:55:42.682512 30687 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 416785ns
I1031 14:55:42.682555 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 225997ns
I1031 14:55:42.682579 30687 leveldb.cpp:399] Deleting ~2 keys from leveldb took 40883ns
I1031 14:55:42.682600 30687 replica.cpp:708] Persisted action TRUNCATE at position 4
I1031 14:55:42.682672 30694 master.cpp:6571] Sending 1 offers to framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
I1031 14:55:42.684432 30687 scheduler.cpp:671] Enqueuing event OFFERS received from http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.687191 30696 scheduler.cpp:235] Sending ACCEPT call to http://172.17.0.3:45279/master/api/v1/scheduler
I1031 14:55:42.688298 30687 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
I1031 14:55:42.689232 30686 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39925
I1031 14:55:42.690259 30686 master.cpp:3581] Processing ACCEPT call for offers: [ 598309b4-b22f-47d3-9a90-7afc2df0d89f-O0 ] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
I1031 14:55:42.690366 30686 master.cpp:3173] Authorizing framework principal 'test-principal' to launch task 21145c68-5a97-4620-a4e0-d84c6f3b8870
W1031 14:55:42.692777 30686 validation.cpp:920] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
W1031 14:55:42.692811 30686 validation.cpp:932] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
I1031 14:55:42.693357 30686 master.cpp:8334] Adding task 21145c68-5a97-4620-a4e0-d84c6f3b8870 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915)
I1031 14:55:42.693691 30686 master.cpp:4230] Launching task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default) with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I1031 14:55:42.694245 30688 slave.cpp:1547] Got assigned task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.694960 30688 slave.cpp:1709] Launching task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.695686 30688 paths.cpp:536] Trying to chown '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503' to user 'mesos'
I1031 14:55:42.702708 30688 slave.cpp:6307] Launching executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 with resources {} in work directory '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
I1031 14:55:42.705585 30688 executor.cpp:189] Version: 1.2.0
I1031 14:55:42.706229 30688 slave.cpp:2031] Queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.706313 30688 slave.cpp:868] Successfully attached file '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
I1031 14:55:42.708365 30687 executor.cpp:390] Connected with the agent
I1031 14:55:42.709179 30698 executor.cpp:291] Sending SUBSCRIBE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.710675 30696 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
I1031 14:55:42.711470 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39927
I1031 14:55:42.711701 30698 slave.cpp:3086] Received Subscribe request for HTTP executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.713021 30698 slave.cpp:2247] Sending queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' to executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (via HTTP)
I1031 14:55:42.714972 30687 executor.cpp:705] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.715925 30688 executor.cpp:705] Enqueuing event LAUNCH received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.717298 30684 executor.cpp:291] Sending UPDATE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:42.718502 30683 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
I1031 14:55:42.719336 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39928
I1031 14:55:42.719491 30698 slave.cpp:3740] Handling status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720316 30697 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720379 30697 status_update_manager.cpp:500] Creating StatusUpdate stream for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720957 30697 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:42.721329 30695 slave.cpp:4169] Forwarding the update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to master@172.17.0.3:45279
I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I../../src/tests/master_tests.cpp:5412: Failure
Failed to wait 15secs for acknowledged
1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.678186 30689 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
{code}

Interestingly there seems to be some out of order log messages in the output

{code}
I1031 14:55:42.719336 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39928
I1031 14:55:42.719491 30698 slave.cpp:3740] Handling status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720316 30697 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720379 30697 status_update_manager.cpp:500] Creating StatusUpdate stream for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.720957 30697 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:42.721329 30695 slave.cpp:4169] Forwarding the update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to master@172.17.0.3:45279
I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
I../../src/tests/master_tests.cpp:5412: Failure
Failed to wait 15secs for acknowledged
1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
{code}


> MasterTest.RecoverResourcesOrphanedTask is flaky
> ------------------------------------------------
>
>                 Key: MESOS-6521
>                 URL: https://issues.apache.org/jira/browse/MESOS-6521
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Vinod Kone
>
> Observed this on ASF CI.
> Looks like the acknowledgement of TASK_RUNNING update was never received by the executor. Not sure why. 
> {code}
> [ RUN      ] MasterTest.RecoverResourcesOrphanedTask
> I1031 14:55:42.624336 30664 cluster.cpp:158] Creating default 'local' authorizer
> I1031 14:55:42.627467 30664 leveldb.cpp:174] Opened db in 2.723103ms
> I1031 14:55:42.628058 30664 leveldb.cpp:181] Compacted db in 560866ns
> I1031 14:55:42.628105 30664 leveldb.cpp:196] Created db iterator in 17967ns
> I1031 14:55:42.628128 30664 leveldb.cpp:202] Seeked to beginning of db in 2025ns
> I1031 14:55:42.628139 30664 leveldb.cpp:271] Iterated through 0 keys in the db in 355ns
> I1031 14:55:42.628175 30664 replica.cpp:776] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I1031 14:55:42.628563 30696 recover.cpp:451] Starting replica recovery
> I1031 14:55:42.629137 30694 recover.cpp:477] Replica is in EMPTY status
> I1031 14:55:42.630564 30693 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from __req_res__(3298)@172.17.0.3:45279
> I1031 14:55:42.631057 30695 recover.cpp:197] Received a recover response from a replica in EMPTY status
> I1031 14:55:42.631556 30694 recover.cpp:568] Updating replica status to STARTING
> I1031 14:55:42.631809 30698 master.cpp:380] Master 598309b4-b22f-47d3-9a90-7afc2df0d89f (169c0a15c915) started on 172.17.0.3:45279
> I1031 14:55:42.632272 30687 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 423123ns
> I1031 14:55:42.631837 30698 master.cpp:382] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --
> authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/oUy0jO/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true"
>  --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per
> _framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="1
> 00secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" --work_dir="/tmp/oUy0jO/master" --zk_session_timeout="10secs"
> I1031 14:55:42.632298 30687 replica.cpp:320] Persisted replica status to STARTING
> I1031 14:55:42.632345 30698 master.cpp:432] Master only allowing authenticated frameworks to register
> I1031 14:55:42.632359 30698 master.cpp:446] Master only allowing authenticated agents to register
> I1031 14:55:42.632367 30698 master.cpp:459] Master only allowing authenticated HTTP frameworks to register
> I1031 14:55:42.632375 30698 credentials.hpp:37] Loading credentials for authentication from '/tmp/oUy0jO/credentials'
> I1031 14:55:42.632555 30686 recover.cpp:477] Replica is in STARTING status
> I1031 14:55:42.632690 30698 master.cpp:504] Using default 'crammd5' authenticator
> I1031 14:55:42.632850 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> I1031 14:55:42.633018 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> I1031 14:55:42.633174 30698 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> I1031 14:55:42.633335 30698 master.cpp:584] Authorization enabled
> I1031 14:55:42.633500 30697 hierarchical.cpp:149] Initialized hierarchical allocator process
> I1031 14:55:42.633513 30683 whitelist_watcher.cpp:77] No whitelist given
> I1031 14:55:42.633719 30683 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from __req_res__(3299)@172.17.0.3:45279
> I1031 14:55:42.634234 30689 recover.cpp:197] Received a recover response from a replica in STARTING status
> I1031 14:55:42.634668 30694 recover.cpp:568] Updating replica status to VOTING
> I1031 14:55:42.635254 30697 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 338613ns
> I1031 14:55:42.635290 30697 replica.cpp:320] Persisted replica status to VOTING
> I1031 14:55:42.635447 30683 recover.cpp:582] Successfully joined the Paxos group
> I1031 14:55:42.635709 30683 recover.cpp:466] Recover process terminated
> I1031 14:55:42.636854 30697 master.cpp:2033] Elected as the leading master!
> I1031 14:55:42.636881 30697 master.cpp:1560] Recovering from registrar
> I1031 14:55:42.636993 30685 registrar.cpp:329] Recovering registrar
> I1031 14:55:42.637830 30695 log.cpp:553] Attempting to start the writer
> I1031 14:55:42.639220 30686 replica.cpp:493] Replica received implicit promise request from __req_res__(3300)@172.17.0.3:45279 with proposal 1
> I1031 14:55:42.639647 30686 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 390697ns
> I1031 14:55:42.639677 30686 replica.cpp:342] Persisted promised to 1
> I1031 14:55:42.640534 30686 coordinator.cpp:238] Coordinator attempting to fill missing positions
> I1031 14:55:42.641780 30688 replica.cpp:388] Replica received explicit promise request from __req_res__(3301)@172.17.0.3:45279 for position 0 with proposal 2
> I1031 14:55:42.642138 30688 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 287987ns
> I1031 14:55:42.642166 30688 replica.cpp:708] Persisted action NOP at position 0
> I1031 14:55:42.643610 30698 replica.cpp:537] Replica received write request for position 0 from __req_res__(3302)@172.17.0.3:45279
> I1031 14:55:42.643682 30698 leveldb.cpp:436] Reading position from leveldb took 37680ns
> I1031 14:55:42.643851 30698 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 126627ns
> I1031 14:55:42.643872 30698 replica.cpp:708] Persisted action NOP at position 0
> I1031 14:55:42.644600 30684 replica.cpp:691] Replica received learned notice for position 0 from @0.0.0.0:0
> I1031 14:55:42.644754 30684 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 125716ns
> I1031 14:55:42.644778 30684 replica.cpp:708] Persisted action NOP at position 0
> I1031 14:55:42.645606 30693 log.cpp:569] Writer started with ending position 0
> I1031 14:55:42.646730 30696 leveldb.cpp:436] Reading position from leveldb took 36848ns
> I1031 14:55:42.647701 30691 registrar.cpp:362] Successfully fetched the registry (0B) in 10.651904ms
> I1031 14:55:42.647816 30691 registrar.cpp:461] Applied 1 operations in 17440ns; attempting to update the registry
> I1031 14:55:42.648564 30693 log.cpp:577] Attempting to append 168 bytes to the log
> I1031 14:55:42.648675 30687 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 1
> I1031 14:55:42.649667 30688 replica.cpp:537] Replica received write request for position 1 from __req_res__(3303)@172.17.0.3:45279
> I1031 14:55:42.650169 30688 leveldb.cpp:341] Persisting action (187 bytes) to leveldb took 454939ns
> I1031 14:55:42.650202 30688 replica.cpp:708] Persisted action APPEND at position 1
> I1031 14:55:42.651391 30689 replica.cpp:691] Replica received learned notice for position 1 from @0.0.0.0:0
> I1031 14:55:42.651859 30689 leveldb.cpp:341] Persisting action (189 bytes) to leveldb took 430513ns
> I1031 14:55:42.651890 30689 replica.cpp:708] Persisted action APPEND at position 1
> I1031 14:55:42.653036 30694 registrar.cpp:506] Successfully updated the registry in 5.159168ms
> I1031 14:55:42.653187 30694 registrar.cpp:392] Successfully recovered registrar
> I1031 14:55:42.653368 30695 log.cpp:596] Attempting to truncate the log to 1
> I1031 14:55:42.653604 30688 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 2
> I1031 14:55:42.653679 30698 master.cpp:1676] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register
> I1031 14:55:42.653715 30691 hierarchical.cpp:176] Skipping recovery of hierarchical allocator: nothing to recover
> I1031 14:55:42.654757 30685 replica.cpp:537] Replica received write request for position 2 from __req_res__(3304)@172.17.0.3:45279
> I1031 14:55:42.655082 30685 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 289879ns
> I1031 14:55:42.655103 30685 replica.cpp:708] Persisted action TRUNCATE at position 2
> I1031 14:55:42.655730 30695 replica.cpp:691] Replica received learned notice for position 2 from @0.0.0.0:0
> I1031 14:55:42.655913 30695 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 157628ns
> I1031 14:55:42.655958 30695 leveldb.cpp:399] Deleting ~1 keys from leveldb took 24969ns
> I1031 14:55:42.655974 30695 replica.cpp:708] Persisted action TRUNCATE at position 2
> I1031 14:55:42.661589 30664 cluster.cpp:435] Creating default 'local' authorizer
> I1031 14:55:42.663805 30691 slave.cpp:208] Mesos agent started on (226)@172.17.0.3:45279
> I1031 14:55:42.663830 30691 slave.cpp:209] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.2.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi"
> I1031 14:55:42.664497 30691 credentials.hpp:86] Loading credential for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/credential'
> I1031 14:55:42.664716 30691 slave.cpp:346] Agent using credential for: test-principal
> I1031 14:55:42.664737 30691 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterTest_RecoverResourcesOrphanedTask_OvkYzN/http_credentials'
> I1031 14:55:42.664749 30664 scheduler.cpp:176] Version: 1.2.0
> I1031 14:55:42.665066 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
> I1031 14:55:42.665258 30691 http.cpp:887] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
> I1031 14:55:42.665680 30695 scheduler.cpp:465] New master detected at master@172.17.0.3:45279
> I1031 14:55:42.665711 30695 scheduler.cpp:474] Waiting for 0ns before initiating a re-(connection) attempt with the master
> I1031 14:55:42.666929 30691 slave.cpp:533] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1031 14:55:42.667012 30691 slave.cpp:541] Agent attributes: [  ]
> I1031 14:55:42.667052 30691 slave.cpp:546] Agent hostname: 169c0a15c915
> I1031 14:55:42.667279 30697 scheduler.cpp:353] Connected with the master at http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.668727 30683 scheduler.cpp:235] Sending SUBSCRIBE call to http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.668860 30690 state.cpp:57] Recovering state from '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta'
> I1031 14:55:42.669205 30685 status_update_manager.cpp:203] Recovering status update manager
> I1031 14:55:42.669450 30687 slave.cpp:5399] Finished recovery
> I1031 14:55:42.669751 30692 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
> I1031 14:55:42.669924 30687 slave.cpp:5573] Querying resource estimator for oversubscribable resources
> I1031 14:55:42.670189 30695 status_update_manager.cpp:177] Pausing sending status updates
> I1031 14:55:42.670199 30684 slave.cpp:915] New master detected at master@172.17.0.3:45279
> I1031 14:55:42.670224 30684 slave.cpp:974] Authenticating with master master@172.17.0.3:45279
> I1031 14:55:42.670284 30684 slave.cpp:985] Using default CRAM-MD5 authenticatee
> I1031 14:55:42.670425 30684 slave.cpp:947] Detecting new master
> I1031 14:55:42.670496 30686 authenticatee.cpp:121] Creating new client SASL connection
> I1031 14:55:42.670532 30684 slave.cpp:5587] Received oversubscribable resources {} from the resource estimator
> I1031 14:55:42.670723 30692 master.cpp:6742] Authenticating slave(226)@172.17.0.3:45279
> I1031 14:55:42.670821 30694 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(514)@172.17.0.3:45279
> I1031 14:55:42.670892 30692 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39926
> I1031 14:55:42.671062 30689 authenticator.cpp:98] Creating new server SASL connection
> I1031 14:55:42.671073 30692 master.cpp:2329] Received subscription request for HTTP framework 'default'
> I1031 14:55:42.671124 30692 master.cpp:2069] Authorizing framework principal 'test-principal' to receive offers for role '*'
> I1031 14:55:42.671303 30683 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
> I1031 14:55:42.671332 30683 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
> I1031 14:55:42.671437 30688 authenticator.cpp:204] Received SASL authentication start
> I1031 14:55:42.671500 30688 authenticator.cpp:326] Authentication requires more steps
> I1031 14:55:42.671597 30695 authenticatee.cpp:259] Received SASL authentication step
> I1031 14:55:42.671645 30692 master.cpp:2427] Subscribing framework 'default' with checkpointing disabled and capabilities [  ]
> I1031 14:55:42.671841 30691 authenticator.cpp:232] Received SASL authentication step
> I1031 14:55:42.671878 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1031 14:55:42.671895 30691 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
> I1031 14:55:42.671932 30691 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I1031 14:55:42.671957 30691 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '169c0a15c915' server FQDN: '169c0a15c915' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I1031 14:55:42.671969 30691 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1031 14:55:42.671978 30691 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1031 14:55:42.672001 30691 authenticator.cpp:318] Authentication success
> I1031 14:55:42.672106 30698 authenticatee.cpp:299] Authentication success
> I1031 14:55:42.672196 30688 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.672237 30692 master.cpp:6772] Successfully authenticated principal 'test-principal' at slave(226)@172.17.0.3:45279
> I1031 14:55:42.672256 30684 hierarchical.cpp:275] Added framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.672273 30693 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(514)@172.17.0.3:45279
> I1031 14:55:42.672332 30684 hierarchical.cpp:1694] No allocations performed
> I1031 14:55:42.672360 30684 hierarchical.cpp:1789] No inverse offers to send out!
> I1031 14:55:42.672405 30684 hierarchical.cpp:1286] Performed allocation for 0 agents in 127100ns
> I1031 14:55:42.672673 30695 slave.cpp:1069] Successfully authenticated with master master@172.17.0.3:45279
> I1031 14:55:42.672861 30695 slave.cpp:1483] Will retry registration in 7.30742ms if necessary
> I1031 14:55:42.673007 30690 scheduler.cpp:671] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.673056 30684 master.cpp:5151] Registering agent at slave(226)@172.17.0.3:45279 (169c0a15c915) with id 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
> I1031 14:55:42.673661 30686 registrar.cpp:461] Applied 1 operations in 62726ns; attempting to update the registry
> I1031 14:55:42.673653 30690 scheduler.cpp:671] Enqueuing event HEARTBEAT received from http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.674896 30690 log.cpp:577] Attempting to append 337 bytes to the log
> I1031 14:55:42.675058 30686 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 3
> I1031 14:55:42.676040 30687 replica.cpp:537] Replica received write request for position 3 from __req_res__(3305)@172.17.0.3:45279
> I1031 14:55:42.676291 30687 leveldb.cpp:341] Persisting action (356 bytes) to leveldb took 204345ns
> I1031 14:55:42.676321 30687 replica.cpp:708] Persisted action APPEND at position 3
> I1031 14:55:42.676970 30688 replica.cpp:691] Replica received learned notice for position 3 from @0.0.0.0:0
> I1031 14:55:42.677459 30688 leveldb.cpp:341] Persisting action (358 bytes) to leveldb took 452504ns
> I1031 14:55:42.677486 30688 replica.cpp:708] Persisted action APPEND at position 3
> I1031 14:55:42.679132 30696 registrar.cpp:506] Successfully updated the registry in 5.394176ms
> I1031 14:55:42.679390 30693 log.cpp:596] Attempting to truncate the log to 3
> I1031 14:55:42.679533 30696 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 4
> I1031 14:55:42.680057 30687 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
> I1031 14:55:42.680407 30687 slave.cpp:1115] Registered with master master@172.17.0.3:45279; given agent ID 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0
> I1031 14:55:42.680209 30695 master.cpp:5222] Registered agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1031 14:55:42.680435 30687 fetcher.cpp:86] Clearing fetcher cache
> I1031 14:55:42.680584 30694 status_update_manager.cpp:184] Resuming sending status updates
> I1031 14:55:42.680702 30686 hierarchical.cpp:485] Added agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
> I1031 14:55:42.680755 30687 slave.cpp:1138] Checkpointing SlaveInfo to '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/meta/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/slave.info'
> I1031 14:55:42.680755 30695 replica.cpp:537] Replica received write request for position 4 from __req_res__(3306)@172.17.0.3:45279
> I1031 14:55:42.681140 30695 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 340426ns
> I1031 14:55:42.681162 30695 replica.cpp:708] Persisted action TRUNCATE at position 4
> I1031 14:55:42.681190 30687 slave.cpp:1175] Forwarding total oversubscribed resources {}
> I1031 14:55:42.681368 30695 master.cpp:5621] Received update of agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) with total oversubscribed resources {}
> I1031 14:55:42.681936 30686 hierarchical.cpp:1789] No inverse offers to send out!
> I1031 14:55:42.682029 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 1.273471ms
> I1031 14:55:42.682063 30687 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0
> I1031 14:55:42.682261 30686 hierarchical.cpp:555] Agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000])
> I1031 14:55:42.682442 30686 hierarchical.cpp:1694] No allocations performed
> I1031 14:55:42.682487 30686 hierarchical.cpp:1789] No inverse offers to send out!
> I1031 14:55:42.682512 30687 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 416785ns
> I1031 14:55:42.682555 30686 hierarchical.cpp:1309] Performed allocation for agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 in 225997ns
> I1031 14:55:42.682579 30687 leveldb.cpp:399] Deleting ~2 keys from leveldb took 40883ns
> I1031 14:55:42.682600 30687 replica.cpp:708] Persisted action TRUNCATE at position 4
> I1031 14:55:42.682672 30694 master.cpp:6571] Sending 1 offers to framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
> I1031 14:55:42.684432 30687 scheduler.cpp:671] Enqueuing event OFFERS received from http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.687191 30696 scheduler.cpp:235] Sending ACCEPT call to http://172.17.0.3:45279/master/api/v1/scheduler
> I1031 14:55:42.688298 30687 process.cpp:3570] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
> I1031 14:55:42.689232 30686 http.cpp:391] HTTP POST for /master/api/v1/scheduler from 172.17.0.3:39925
> I1031 14:55:42.690259 30686 master.cpp:3581] Processing ACCEPT call for offers: [ 598309b4-b22f-47d3-9a90-7afc2df0d89f-O0 ] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915) for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default)
> I1031 14:55:42.690366 30686 master.cpp:3173] Authorizing framework principal 'test-principal' to launch task 21145c68-5a97-4620-a4e0-d84c6f3b8870
> W1031 14:55:42.692777 30686 validation.cpp:920] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less CPUs (None) than the minimum required (0.01). Please update your executor, as this will be mandatory in future releases.
> W1031 14:55:42.692811 30686 validation.cpp:932] Executor 'default' for task '21145c68-5a97-4620-a4e0-d84c6f3b8870' uses less memory (None) than the minimum required (32MB). Please update your executor, as this will be mandatory in future releases.
> I1031 14:55:42.693357 30686 master.cpp:8334] Adding task 21145c68-5a97-4620-a4e0-d84c6f3b8870 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 (169c0a15c915)
> I1031 14:55:42.693691 30686 master.cpp:4230] Launching task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (default) with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
> I1031 14:55:42.694245 30688 slave.cpp:1547] Got assigned task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.694960 30688 slave.cpp:1709] Launching task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.695686 30688 paths.cpp:536] Trying to chown '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503' to user 'mesos'
> I1031 14:55:42.702708 30688 slave.cpp:6307] Launching executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 with resources {} in work directory '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
> I1031 14:55:42.705585 30688 executor.cpp:189] Version: 1.2.0
> I1031 14:55:42.706229 30688 slave.cpp:2031] Queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' for executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.706313 30688 slave.cpp:868] Successfully attached file '/tmp/MasterTest_RecoverResourcesOrphanedTask_ORRYWi/slaves/598309b4-b22f-47d3-9a90-7afc2df0d89f-S0/frameworks/598309b4-b22f-47d3-9a90-7afc2df0d89f-0000/executors/default/runs/8d61dee3-8e4c-434b-937b-c30457814503'
> I1031 14:55:42.708365 30687 executor.cpp:390] Connected with the agent
> I1031 14:55:42.709179 30698 executor.cpp:291] Sending SUBSCRIBE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:42.710675 30696 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
> I1031 14:55:42.711470 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39927
> I1031 14:55:42.711701 30698 slave.cpp:3086] Received Subscribe request for HTTP executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.713021 30698 slave.cpp:2247] Sending queued task '21145c68-5a97-4620-a4e0-d84c6f3b8870' to executor 'default' of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (via HTTP)
> I1031 14:55:42.714972 30687 executor.cpp:705] Enqueuing event SUBSCRIBED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:42.715925 30688 executor.cpp:705] Enqueuing event LAUNCH received from http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:42.717298 30684 executor.cpp:291] Sending UPDATE call to http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:42.718502 30683 process.cpp:3570] Handling HTTP event for process 'slave(226)' with path: '/slave(226)/api/v1/executor'
> I1031 14:55:42.719336 30698 http.cpp:277] HTTP POST for /slave(226)/api/v1/executor from 172.17.0.3:39928
> I1031 14:55:42.719491 30698 slave.cpp:3740] Handling status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.720316 30697 status_update_manager.cpp:323] Received status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.720379 30697 status_update_manager.cpp:500] Creating StatusUpdate stream for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.720957 30697 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
> I1031 14:55:42.721329 30695 slave.cpp:4169] Forwarding the update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to master@172.17.0.3:45279
> I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
> I../../src/tests/master_tests.cpp:5412: Failure
> Failed to wait 15secs for acknowledged
> 1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
> I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
> I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
> I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
> W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
> I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
> I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:56:04.678186 30689 slave.cpp:4251] Received ping from slave-observer(231)@172.17.0.3:45279
> {code}
> Interestingly there seems to be some out of order log messages in the output. 14:55:43 -> 14:56:04 -> 14:55:52
> {code}
> I1031 14:55:42.721566 30695 slave.cpp:4063] Status update manager successfully handled status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.721755 30684 master.cpp:5757] Status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 from agent 598309b4-b22f-47d3-9a90-7afc2df0d89f-S0 at slave(226)@172.17.0.3:45279 (169c0a15c915)
> I../../src/tests/master_tests.cpp:5412: Failure
> Failed to wait 15secs for acknowledged
> 1031 14:55:42.721860 30684 master.cpp:5819] Forwarding status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:55:42.722574 30685 executor.cpp:705] Enqueuing event ACKNOWLEDGED received from http://172.17.0.3:45279/slave(226)/api/v1/executor
> I1031 14:55:43.634330 30687 hierarchical.cpp:1694] No allocations performed
> I1031 14:56:04.675415 30687 hierarchical.cpp:1789] No inverse offers to send out!
> I1031 14:56:04.675559 30687 hierarchical.cpp:1286] Performed allocation for 1 agents in 21.041389234secs
> I1031 14:56:04.675912 30684 master.cpp:7712] Updating the state of task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 (latest state: TASK_RUNNING, status update state: TASK_RUNNING)
> W1031 14:55:52.721946 30690 status_update_manager.cpp:478] Resending status update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> I1031 14:56:04.676465 30690 status_update_manager.cpp:377] Forwarding update TASK_RUNNING (UUID: 44e6ff49-72ce-48b1-849d-42fc63c4b73b) for task 21145c68-5a97-4620-a4e0-d84c6f3b8870 of framework 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000 to the agent
> I1031 14:55:57.671268 30689 slave.cpp:5573] Querying resource estimator for oversubscribable resources
> I1031 14:55:57.673494 30692 master.hpp:2161] Sending heartbeat to 598309b4-b22f-47d3-9a90-7afc2df0d89f-0000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message