Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5001DECAF for ; Thu, 28 Feb 2013 00:55:13 +0000 (UTC) Received: (qmail 21779 invoked by uid 500); 28 Feb 2013 00:55:13 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 21751 invoked by uid 500); 28 Feb 2013 00:55:13 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 21737 invoked by uid 99); 28 Feb 2013 00:55:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2013 00:55:13 +0000 Date: Thu, 28 Feb 2013 00:55:12 +0000 (UTC) From: "Benjamin Mahler (JIRA)" To: mesos-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (MESOS-367) Invalid StatusUpdateMessage from missing slave id. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Benjamin Mahler created MESOS-367: ------------------------------------- Summary: Invalid StatusUpdateMessage from missing slave id. Key: MESOS-367 URL: https://issues.apache.org/jira/browse/MESOS-367 Project: Mesos Issue Type: Bug Reporter: Benjamin Mahler Priority: Critical It looks like the ExecutorProcess sets its internal slaveId upon registration: void registered(const ExecutorInfo& executorInfo, const FrameworkID& frameworkId, const FrameworkInfo& frameworkInfo, const SlaveID& slaveId, const SlaveInfo& slaveInfo) { if (aborted) { VLOG(1) << "Ignoring registered message from slave " << slaveId << " because the driver is aborted!"; return; } VLOG(1) << "Executor registered on slave " << slaveId; **** this->slaveId = slaveId; *** executor->registered(driver, executorInfo, frameworkInfo, slaveInfo); } A result of this is that if the registration is delayed, the executor can come up and send a status update (before the slaveId is set), resulting in an incomplete protobuf: void sendStatusUpdate(const TaskStatus& status) { VLOG(1) << "Executor sending status update for task " << status.task_id() << " in state " << status.state(); if (status.state() == TASK_STAGING) { VLOG(1) << "Executor is not allowed to send " << "TASK_STAGING status updates. Aborting!"; driver->abort(); executor->error(driver, "Attempted to send TASK_STAGING status update"); return; } StatusUpdateMessage message; StatusUpdate* update = message.mutable_update(); update->mutable_framework_id()->MergeFrom(frameworkId); update->mutable_executor_id()->MergeFrom(executorId); **** update->mutable_slave_id()->MergeFrom(slaveId); **** update->mutable_status()->MergeFrom(status); update->set_timestamp(Clock::now()); update->set_uuid(UUID::random().toBytes()); send(slave, message); } The ExecutorProcess should take the slaveId in its constructor to avoid this issue. Here are the relevant log lines: I0227 23:45:56.547392 38406 slave.cpp:762] Got registration for executor 'thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0' of framework 201103282247-0000000019-0000 I0227 23:45:56.547610 38411 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000000 19-0000 with resources cpus=0.35; mem=176; disk=512; ports=[31385-31385] I0227 23:45:56.547863 38406 slave.cpp:820] Flushing queued tasks for framework 201103282247-0000000019-0000 I0227 23:45:56.548074 38411 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 358 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 201103282247-00000 00019-0000 I0227 23:45:56.548812 38411 cgroups_isolation_module.cpp:774] Updated 'memory.limit_in_bytes' to 184549376 for executor thermos-1362008747374-wickman-seizure-4-933a8193-96b1-411f-9392-3e4bd2cda6f0 of framework 2 01103282247-0000000019-0000 libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value W0227 23:45:56.663353 38408 protobuf.hpp:252] Initialization errors: update.slave_id.value libprotobuf ERROR google/protobuf/message_lite.cc:123] Can't parse message of type "mesos.internal.StatusUpdateMessage" because it is missing required fields: update.slave_id.value W0227 23:45:56.673761 38400 protobuf.hpp:252] Initialization errors: update.slave_id.value -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira