Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 06F3AF915 for ; Wed, 10 Apr 2013 16:59:23 +0000 (UTC) Received: (qmail 4955 invoked by uid 500); 10 Apr 2013 16:59:23 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 4868 invoked by uid 500); 10 Apr 2013 16:59:23 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 4857 invoked by uid 99); 10 Apr 2013 16:59:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Apr 2013 16:59:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of benjamin.mahler@gmail.com designates 209.85.212.51 as permitted sender) Received: from [209.85.212.51] (HELO mail-vb0-f51.google.com) (209.85.212.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Apr 2013 16:59:13 +0000 Received: by mail-vb0-f51.google.com with SMTP id x19so562003vbf.10 for ; Wed, 10 Apr 2013 09:58:52 -0700 (PDT) X-Received: by 10.52.75.65 with SMTP id a1mr1852105vdw.79.1365613132683; Wed, 10 Apr 2013 09:58:52 -0700 (PDT) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx.google.com with ESMTPS id em6sm708967vdb.3.2013.04.10.09.58.50 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Apr 2013 09:58:51 -0700 (PDT) Received: by mail-ve0-f176.google.com with SMTP id ox1so625039veb.35 for ; Wed, 10 Apr 2013 09:58:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=XzUIvRKR/4uk9vLSqGEEGjOdMoJN3VhKf3B1/My9kH8=; b=LtB3cxGfz8tIkp7hav/tM9FC/GlB5hd0l4EA9YNy+jHx6BzR63GL6fPnCx1wPl58Ro IDXKm8ExlTp/jswnS15uUDocWrMxtodC+ZAAaM3hvhPzpXe6npvuH9ssQAwLl5QlBb+E cttDOP3whFeyWQDusj4LBiOkI2H5rDF+gOXFk9vPeEllAU4y75t0ijpAaKQ/+uVek5Vm 9fw3+1ptW0RBrc+sz+Hhi3CDPECd8MGeR52atibhNGgLyYNVGxms7m3Wf04tqAgrQ+0C FStPIezyDo8KzkW9N3xHMo0X2xgI+zcMXu3C9ANzvNMxY3nORkMyhtLgWxFRoVutLsK6 K1pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=XzUIvRKR/4uk9vLSqGEEGjOdMoJN3VhKf3B1/My9kH8=; b=EUUCsK7PH3eVKwOaYRj23x0XzB+uraHblwdiT78ETU8z3o6p/Lnlt+QnEh5sU+8pSe 6hECfUoR2Z2WUPqQXqlaUQ/aDVJ4HlXGeAsEqPpzH2ZNhwSVrLMzs7aW5Yn58eSKSzuu TaUBg6kDVgC3hJQpUhUkF4yCj2rCCQ7+Y57/lKHuaY9hbzYfdQtF+EkubmQH2S191GO4 XQMjm4JYSaY7IhExhjvLsRi3gldCh+dYOaaUD39MSwZBke/IIYTohk2QifwCivE/+0nI OedOceXbZPsEU7tVN9Ac+KR1MWsxzG6nn0zjSRasXPK9oCc4Py5RKl7U0RAmYhnuCo+T 8eUA== X-Received: by 10.52.229.234 with SMTP id st10mr1847145vdc.70.1365613130035; Wed, 10 Apr 2013 09:58:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.127.15 with HTTP; Wed, 10 Apr 2013 09:58:29 -0700 (PDT) In-Reply-To: References: From: Benjamin Mahler Date: Wed, 10 Apr 2013 09:58:29 -0700 Message-ID: Subject: Re: long lived framework can not launch task To: "mesos-dev@incubator.apache.org" Content-Type: multipart/alternative; boundary=089e011822e884770a04da049542 X-Gm-Message-State: ALoCoQkycJwLWtpKkgMW3Qf1rFLLeyHoAdOX3Mu49KjEtSBoUj/n/1BtPss2VLPa1HeZFZo5R0hK X-Virus-Checked: Checked by ClamAV on apache.org --089e011822e884770a04da049542 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Yes, each executor ID can have only one 'source'. That is the only constraint, which means you can share the same 'source' across different executors. To see the benefit of the 'source' field, take this example below. Say we have a framework and we want to spin up three executors to run a job. We can have three executors: 1: ExecutorID: "e_1", source: "page_rank_job" 1: ExecutorID: "e_2", source: "page_rank_job" 1: ExecutorID: "e_3", source: "page_rank_job" This allows us to identify the source of the executor. This is also exported as part of the monitoring stats on each slave. $ curl :5051/monitor/usage.json [ {"executor_id":"e1", "executor_name": "task_tracker", "source": "page_rank_job", "framework_id":"201301240028-2449582272-5050-61291-0000", "resource_usage": {"cpu_time":1.0, // CPU time (user + system) in seconds. "cpu_usage":0.5, // CPU usage [0.5 ~ 50% of 1 core]. "memory_rss":1048576, // RSS in bytes. // ... Additional stats in the future. }, }, {"executor_id":"e2", "executor_name": "task_tracker", "source": "page_rank_job", "framework_id":"201301240028-2449582272-5050-61291-0000", "resource_usage": {"cpu_time":1.0, // CPU time (user + system) in seconds. "cpu_usage":16.0, // CPU usage [16.0 ~ 1600% usage ~ 100% of 16 c= ores] "memory_rss":1048576, // RSS in bytes. // ... Additional stats in the future. }, }, Here we can aggregate all the resource usage for the page_rank_job by asking all of the slaves for the usage information. But if you're looking to write a framework, you can do without setting the source field for now, and set it later if the need arises. I'll eventually have some more documentation around this: https://issues.apache.org/jira/browse/MESOS-373 On Wed, Apr 10, 2013 at 2:28 AM, =CD=F5=B9=FA=B6=B0 wr= ote: > Hi Ben, > > I find some comment in mesos.proto file, > // Source is an identifier style string used by frameworks to track > // the source of an executor. > > Does this mean each executorId can only have the identical source ? > In the old long-lived-framework, we are trying to assign different source= s > to the same executor. So the error happens. > > Am I right? > > Thanks. > > Guodong > > > On Wed, Apr 10, 2013 at 3:02 PM, =CD=F5=B9=FA=B6=B0 = wrote: > > > Hi Ben, > > > > It works now. Thank you for your reply. > > > > I am trying to learn to write a framework on mesos. But I can not find > the > > exact meaning of each param in the API. eg. I do not know the meaning o= f > > "source". Where can I find the docs about the programming guide? > > > > Thanks. > > > > > > Guodong > > > > > > On Wed, Apr 10, 2013 at 2:09 AM, Benjamin Mahler < > > benjamin.mahler@gmail.com> wrote: > > > >> Thanks for the report! > >> > >> The bug here is that we set the source for each task: > >> > >> TaskInfo task; > >> task.set_name("Task " + lexical_cast(taskId)); > >> task.mutable_task_id()->set_value(lexical_cast(taskId)= ); > >> task.mutable_slave_id()->MergeFrom(offer.slave_id()); > >> task.mutable_executor()->MergeFrom(executor); > >> *task.mutable_executor()->set_source("task_" + > >> stringify(taskId));* > >> > >> I'll have a review out shortly to fix this. > >> > >> > >> On Tue, Apr 9, 2013 at 2:43 AM, =CD=F5=B9=FA=B6=B0 wrote: > >> > >> > hi > >> > > >> > I am trying to run long-lived-framework in the trunk. > >> > But I find the following error after task 1 finished. > >> > > >> > W0409 17:18:03.841472 15305 master.cpp:1566] Error validating task 1= : > >> Task > >> > has invalid ExecutorInfo (existing ExecutorInfo with same ExecutorID > is > >> not > >> > compatible) > >> > > >> > Then all the tasks will be lost. > >> > > >> > The log of the framework is as follow: > >> > Registered! > >> > .Starting task 0 on guodong-Vostro-3400 > >> > Task 0 is in state 1 > >> > Task 0 is in state 2 > >> > .Starting task 1 on guodong-Vostro-3400 > >> > Task 1 is in state 5 > >> > .Starting task 2 on guodong-Vostro-3400 > >> > Task 2 is in state 5 > >> > .Starting task 3 on guodong-Vostro-3400 > >> > Task 3 is in state 5 > >> > > >> > > >> > I also go through the code of LongLivedFramework. And I can not > >> understand > >> > this error, since the ExecutorInfo is passed as the constructor > >> arguments > >> > of Scheduler. > >> > > >> > Best regards. > >> > > >> > Guodong > >> > > >> > > > > > --089e011822e884770a04da049542--