mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-424) CgroupsIsolatorTest.BalloonFramework runs forever
Date Sat, 04 May 2013 22:34:16 GMT

    [ https://issues.apache.org/jira/browse/MESOS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649191#comment-13649191
] 

Benjamin Mahler commented on MESOS-424:
---------------------------------------

Vinod, the TaskKiller attempts the following set of steps to kill the cgroup:

freeze -> kill -> thaw -> empty?

If not empty, it will retry the chain, this repeats forever until it succeeds.

  void killTasks() {
    // Chain together the steps needed to kill the tasks. Note that we
    // ignore the return values of freeze, kill, and thaw because,
    // provided there are no errors, we'll just retry the chain as
    // long as tasks still exist.
    chain = freeze()                      // Freeze the cgroup.
      .then(defer(self(), &Self::kill))   // Send kill signals to all tasks.
      .then(defer(self(), &Self::thaw))   // Thaw cgroup to deliver signals.
      .then(defer(self(), &Self::empty)); // Wait until cgroup is empty.

    chain.onAny(defer(self(), &Self::finished, lambda::_1));
  }

Then in finished:

  void finished(const Future<bool>& empty)
  {
    CHECK(!empty.isPending() && !empty.isDiscarded());
    if (empty.isFailed()) {
      promise.fail(empty.failure());
      terminate(self());
    } else if (empty.get()) {
      promise.set(true);
      terminate(self());
    } else {
      // The cgroup was not empty after the retry limit.
      // We need to re-attempt the freeze/kill/thaw/watch chain.
      killTasks(); <------------ RETRY THE CHAIN
    }
  }
                
> CgroupsIsolatorTest.BalloonFramework runs forever
> -------------------------------------------------
>
>                 Key: MESOS-424
>                 URL: https://issues.apache.org/jira/browse/MESOS-424
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Thomas Marshall
>
> On Ubuntu 12.04 Server, running as root:
> bin/mesos-tests.sh --gtest_filter=*Balloon* --verbose
> Source directory: /root/mesos
> Build directory: /root/mesos/build
> Note: Google Test filter = *Balloon*-
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from CgroupsIsolatorTest
> [ RUN      ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
> Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_1JMuXO'
> Launched master at 1770
> I0402 15:20:23.570971  1770 main.cpp:116] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:23.571444  1770 main.cpp:117] Starting Mesos master
> I0402 15:20:23.572792  1788 master.cpp:309] Master started on 127.0.1.1:5432
> I0402 15:20:23.573097  1788 master.cpp:324] Master ID: 201304021520-16842879-5432-1770
> W0402 15:20:23.574090  1787 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:20:23.577419  1788 master.cpp:603] Elected as master!
> Launched slave at 1790
> I0402 15:20:25.570708  1790 main.cpp:124] Creating "cgroups" isolator
> I0402 15:20:25.571761  1790 main.cpp:132] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:25.571790  1790 main.cpp:133] Starting Mesos slave
> I0402 15:20:25.574848  1808 slave.cpp:203] Slave started on 1)@127.0.1.1:51739
> I0402 15:20:25.574906  1808 slave.cpp:204] Slave resources: cpus=1; mem=96; ports=[31000-32000];
disk=7572
> I0402 15:20:25.575526  1805 cgroups_isolator.cpp:236] Using /cgroup as cgroups hierarchy
root
> I0402 15:20:25.577657  1807 slave.cpp:453] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.577888  1807 status_update_manager.cpp:132] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.586076  1805 cgroups_isolator.cpp:690] Recovering isolator
> I0402 15:20:25.586915  1808 slave.cpp:377] Finished recovery
> I0402 15:20:25.588171  1787 master.cpp:968] Attempting to register slave on ubuntu at
slave(1)@127.0.1.1:51739
> I0402 15:20:25.588276  1787 master.cpp:1224] Master now considering a slave at ubuntu:51739
as active
> I0402 15:20:25.589035  1787 master.cpp:1862] Adding slave 201304021520-16842879-5432-1770-0
at ubuntu with cpus=1; mem=96; ports=[31000-32000]; disk=7572
> I0402 15:20:25.589582  1787 hierarchical_allocator_process.hpp:395] Added slave 201304021520-16842879-5432-1770-0
(ubuntu) with cpus=1; mem=96; ports=[31000-32000]; disk=7572 (and cpus=1; mem=96; ports=[31000-32000];
disk=7572 available)
> I0402 15:20:25.589867  1807 slave.cpp:487] Registered with master; given slave ID 201304021520-16842879-5432-1770-0
> I0402 15:20:27.567234  1786 master.cpp:646] Registering framework 201304021520-16842879-5432-1770-0000
at scheduler(1)@127.0.1.1:54177
> I0402 15:20:27.567627  1786 hierarchical_allocator_process.hpp:268] Added framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.568018  1786 master.hpp:309] Adding offer with resources cpus=1; mem=96;
ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> Registered
> I0402 15:20:27.568243  1786 master.cpp:1327] Sending 1 offers to framework 201304021520-16842879-5432-1770-0000
> Resource offers received
> Starting the task
> I0402 15:20:27.569226  1788 master.cpp:1534] Processing reply for offer 201304021520-16842879-5432-1770-0
on slave 201304021520-16842879-5432-1770-0 (ubuntu) for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.569449  1788 master.hpp:289] Adding task with resources mem=32 on slave
201304021520-16842879-5432-1770-0
> I0402 15:20:27.569537  1788 master.cpp:1651] Launching task 1 of framework 201304021520-16842879-5432-1770-0000
with resources mem=32 on slave 201304021520-16842879-5432-1770-0 (ubuntu)
> I0402 15:20:27.569792  1788 master.hpp:318] Removing offer with resources cpus=1; mem=96;
ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> I0402 15:20:27.569903  1785 hierarchical_allocator_process.hpp:497] Framework 201304021520-16842879-5432-1770-0000
filtered slave 201304021520-16842879-5432-1770-0 for 5.00secs
> I0402 15:20:27.570047  1805 slave.cpp:587] Got assigned task 1 for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.572463  1805 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573072  1805 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573310  1806 cgroups_isolator.cpp:488] Launching default (/root/mesos/build/src/balloon-executor)
in /tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e
with resources mem=64 for framework 201304021520-16842879-5432-1770-0000 in cgroup mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:27.573943  1806 cgroups_isolator.cpp:631] Changing cgroup controls for executor
default of framework 201304021520-16842879-5432-1770-0000 with resources mem=64
> I0402 15:20:27.574291  1806 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes'
to 67108864 for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.574923  1806 cgroups_isolator.cpp:924] Started listening for OOM events
for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.575889  1806 cgroups_isolator.cpp:517] Forked executor at = 1829
> Fetching resources into '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.641137  1808 slave.cpp:1046] Got registration for executor 'default' of
framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641315  1808 slave.cpp:1121] Flushing queued tasks for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641386  1808 cgroups_isolator.cpp:631] Changing cgroup controls for executor
default of framework 201304021520-16842879-5432-1770-0000 with resources mem=96
> I0402 15:20:27.641913  1808 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes'
to 100663296 for executor default of framework 201304021520-16842879-5432-1770-0000
> W0402 15:20:28.575875  1788 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:20:28.897797  1807 cgroups_isolator.cpp:944] OOM notifier is triggered for executor
default of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.897902  1807 cgroups_isolator.cpp:989] OOM detected for executor default
of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.899562  1807 cgroups_isolator.cpp:1030] Memory limit exceeded: Requested:
96MB Used: 96MB
> MEMORY STATISTICS: 
> cache 0
> rss 100663296
> mapped_file 0
> swap 2424832
> pgpgin 25936
> pgpgout 1360
> pgfault 31673
> pgmajfault 1
> inactive_anon 0
> active_anon 0
> inactive_file 0
> active_file 0
> unevictable 100663296
> hierarchical_memory_limit 100663296
> hierarchical_memsw_limit 9223372036854775807
> total_cache 0
> total_rss 100663296
> total_mapped_file 0
> total_swap 2424832
> total_pgpgin 25936
> total_pgpgout 1360
> total_pgfault 31673
> total_pgmajfault 1
> total_inactive_anon 0
> total_active_anon 0
> total_inactive_file 0
> total_active_file 0
> total_unevictable 100663296
> I0402 15:20:28.899739  1807 cgroups_isolator.cpp:596] Killing executor default of framework
201304021520-16842879-5432-1770-0000
> I0402 15:20:28.901882  1805 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:32.578037  1807 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:33.578172  1788 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> W0402 15:20:34.065656  1805 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
within 51 attempts
> I0402 15:20:34.067944  1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.068300  1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.582098  1805 cgroups_isolator.cpp:766] Executor default of framework 201304021520-16842879-5432-1770-0000
terminated with status 9
> W0402 15:20:37.579793  1807 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:38.580425  1787 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:20:39.216334  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:42.580739  1806 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:43.582556  1788 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> W0402 15:20:44.377604  1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
within 51 attempts
> I0402 15:20:44.379775  1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:44.379935  1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:47.581902  1807 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:48.584782  1786 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:20:49.528096  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:52.583258  1806 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:53.586912  1787 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> W0402 15:20:54.691306  1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
within 51 attempts
> I0402 15:20:54.693431  1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:54.693737  1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:57.584837  1806 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:58.588543  1788 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:20:59.842075  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:02.586467  1806 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:03.590638  1787 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> W0402 15:21:05.003955  1806 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
within 51 attempts
> I0402 15:21:05.006346  1807 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:05.006577  1807 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:07.588361  1807 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:08.592641  1786 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> I0402 15:21:10.155868  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:12.590788  1806 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:13.594530  1787 master.cpp:81] No whitelist given. Advertising offers for
all slaves
> W0402 15:21:15.316937  1807 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
within 51 attempts
> I0402 15:21:15.319368  1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:15.319533  1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:17.591588  1805 monitor.cpp:212] Failed to collect resource usage for executor
'default' of framework '201304021520-16842879-5432-1770-0000': 1
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message