Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8BA189350 for ; Tue, 6 Nov 2012 01:59:02 +0000 (UTC) Received: (qmail 24492 invoked by uid 500); 6 Nov 2012 01:59:02 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 24447 invoked by uid 500); 6 Nov 2012 01:59:02 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 24435 invoked by uid 99); 6 Nov 2012 01:59:02 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2012 01:59:02 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 7B1D21C00B1; Tue, 6 Nov 2012 01:58:59 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6499848903886557133==" MIME-Version: 1.0 Subject: Review Request: Send TASK_FAILED updates when an executor is destroyed by the isolation module From: "Vinod Kone" To: "Benjamin Hindman" , "Ben Mahler" Cc: "mesos" , "Vinod Kone" Date: Tue, 06 Nov 2012 01:58:59 -0000 Message-ID: <20121106015859.24874.69802@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Vinod Kone" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/7887/ X-Sender: "Vinod Kone" Reply-To: "Vinod Kone" --===============6499848903886557133== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7887/ ----------------------------------------------------------- Review request for mesos, Benjamin Hindman and Ben Mahler. Description ------- See summary Diffs ----- src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a = src/examples/balloon_framework.cpp e9b60de0c7d3a96381aff37340e0f5ac499850= dd = src/slave/cgroups_isolation_module.hpp dd4703a1ca584d2347efac95bcdfae9a84= 544d4a = src/slave/cgroups_isolation_module.cpp 3d10ee568b8f194543707374f34f21bd3a= 927958 = src/slave/lxc_isolation_module.cpp 36d86e08f7b511371a9a2053ddf43477063a79= f1 = src/slave/process_based_isolation_module.cpp b0b6a81c93acc68d1f4acbdda5ab= 2f9f96b5fb5a = src/slave/slave.hpp be0d7cc239e51636bb07e12c3046e0751a958787 = src/slave/slave.cpp 2bd2dbce538a6108dd9fe607829cfbdab33e0778 = src/tests/fault_tolerance_tests.cpp a01d1aef012b636f2ced64d4d2ffabfb6ce42= 644 = src/tests/gc_tests.cpp b61b2de621e227f327ce546b62f8dfc528f3894e = src/tests/master_tests.cpp d9cd09c5650234351f570f0a035f4b61cd2d00f5 = Diff: https://reviews.apache.org/r/7887/diff/ Testing ------- make check (CentOs) [vinod@smfd-aki-27-sr1:~/mesos/build] $ sudo GLOG_v=3D1 ./bin/mesos-tests.s= h --gtest_filter=3D"*CgroupsIsolationTest*" --verbose ... ... I1106 01:53:54.852120 61941 cgroups_isolation_module.cpp:617] OOM notifier = is triggered for executor default of framework 201211060153-2081170186-5432= -61885-0000 with tag bf7fc2e7-a9c4-4240-8300-18acb99490dc I1106 01:53:54.852165 61941 cgroups_isolation_module.cpp:662] OOM detected = for executor default of framework 201211060153-2081170186-5432-61885-0000 w= ith tag bf7fc2e7-a9c4-4240-8300-18acb99490dc I1106 01:53:54.852854 61941 cgroups_isolation_module.cpp:689] MEMORY LIMIT:= 100663296 bytes MEMORY USAGE: 100663296 bytes MEMORY STATISTICS: = cache 245760 rss 100417536 mapped_file 24576 pgpgin 7320 pgpgout 6250 inactive_anon 0 active_anon 1826816 inactive_file 192512 active_file 53248 unevictable 98590720 hierarchical_memory_limit 100663296 total_cache 245760 total_rss 100417536 total_mapped_file 24576 total_pgpgin 7320 total_pgpgout 6250 total_inactive_anon 0 total_active_anon 1826816 total_inactive_file 192512 total_active_file 53248 total_unevictable 98590720 I1106 01:53:54.852898 61941 cgroups_isolation_module.cpp:408] Killing execu= tor default of framework 201211060153-2081170186-5432-61885-0000 I1106 01:53:54.855185 61937 cgroups.cpp:1116] Attempting to freeze cgroup '= mesos/framework_201211060153-2081170186-5432-61885-0000_executor_default_ta= g_bf7fc2e7-a9c4-4240-8300-18acb99490dc' I1106 01:53:55.536480 61907 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:53:55.536576 61907 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 130.08us I1106 01:53:56.537866 61903 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:53:56.537951 61903 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 103.18us I1106 01:53:57.538408 61912 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:53:57.538483 61912 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 93.44us I1106 01:53:58.539499 61908 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:53:58.539593 61908 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 113.75us W1106 01:53:59.532685 61903 master.cpp:79] No whitelist given. Advertising = offers for all slaves I1106 01:53:59.540832 61912 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:53:59.540907 61912 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 91.56us W1106 01:54:00.020642 61941 cgroups.cpp:1201] Unable to freeze cgroup 'meso= s/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_bf= 7fc2e7-a9c4-4240-8300-18acb99490dc' within 51 attempts I1106 01:54:00.022102 61937 cgroups.cpp:1131] Attempting to thaw cgroup 'me= sos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag_= bf7fc2e7-a9c4-4240-8300-18acb99490dc' I1106 01:54:00.022274 61937 cgroups.cpp:1237] Successfully thawed cgroup 'm= esos/framework_201211060153-2081170186-5432-61885-0000_executor_default_tag= _bf7fc2e7-a9c4-4240-8300-18acb99490dc' I1106 01:54:00.030532 61948 process.cpp:872] Socket closed while receiving I1106 01:54:00.129642 61936 cgroups_isolation_module.cpp:705] Successfully = destroyed the cgroup mesos/framework_201211060153-2081170186-5432-61885-000= 0_executor_default_tag_bf7fc2e7-a9c4-4240-8300-18acb99490dc I1106 01:54:00.539801 61944 cgroups_isolation_module.cpp:468] Telling slave= of terminated executor default of framework 201211060153-2081170186-5432-6= 1885-0000 I1106 01:54:00.539939 61934 slave.cpp:1008] Executor 'default' of framework= 201211060153-2081170186-5432-61885-0000 has terminated with signal Killed I1106 01:54:00.541018 61934 slave.cpp:833] Status update: task 1 of framewo= rk 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED I1106 01:54:00.541290 61944 cgroups_isolation_module.cpp:441] Asked to upda= te resources for an unknown/terminated executor I1106 01:54:00.541384 61904 hierarchical_allocator_process.hpp:608] No reso= urces available to allocate! I1106 01:54:00.541460 61904 hierarchical_allocator_process.hpp:543] Perform= ed allocation for 1 slaves in 87.63us I1106 01:54:00.541471 61936 gc.cpp:97] Scheduling /tmp/mesos/slaves/2012110= 60153-2081170186-5432-61885-0/frameworks/201211060153-2081170186-5432-61885= -0000/executors/default/runs/c842b51d-d962-4b20-a80a-bfe484f6dc95 for remov= al I1106 01:54:00.541610 61907 master.cpp:1024] Status update from slave(1)@10= .35.12.124:36146: task 1 of framework 201211060153-2081170186-5432-61885-00= 00 is now in state TASK_FAILED I1106 01:54:00.541759 61907 master.hpp:288] Removing task with resources me= m=3D32 on slave 201211060153-2081170186-5432-61885-0 I1106 01:54:00.541872 61907 master.cpp:1125] Executor default of framework = 201211060153-2081170186-5432-61885-0000 on slave 201211060153-2081170186-54= 32-61885-0 (smfd-aki-27-sr1.devel.twitter.com) exited with status 9 I1106 01:54:00.541872 61912 hierarchical_allocator_process.hpp:491] Recover= ed mem=3D32 on slave 201211060153-2081170186-5432-61885-0 from framework 20= 1211060153-2081170186-5432-61885-0000 I1106 01:54:00.541967 61912 hierarchical_allocator_process.hpp:491] Recover= ed mem=3D64 on slave 201211060153-2081170186-5432-61885-0 from framework 20= 1211060153-2081170186-5432-61885-0000 I1106 01:54:00.542150 61984 sched.cpp:326] Status update: task 1 of framewo= rk 201211060153-2081170186-5432-61885-0000 is now in state TASK_FAILED Task in state TASK_FAILED Reason: MEMORY LIMIT: 100663296 bytes MEMORY USAGE: 100663296 bytes MEMORY STATISTICS: = cache 245760 rss 100417536 mapped_file 24576 pgpgin 7320 pgpgout 6250 inactive_anon 0 active_anon 1826816 inactive_file 192512 active_file 53248 unevictable 98590720 hierarchical_memory_limit 100663296 total_cache 245760 total_rss 100417536 total_mapped_file 24576 total_pgpgin 7320 total_pgpgout 6250 total_inactive_anon 0 total_active_anon 1826816 total_inactive_file 192512 total_active_file 53248 total_unevictable 98590720 Thanks, Vinod Kone --===============6499848903886557133==--