Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E75879D0B for ; Fri, 23 Mar 2012 16:25:55 +0000 (UTC) Received: (qmail 24986 invoked by uid 500); 23 Mar 2012 16:25:55 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 24951 invoked by uid 500); 23 Mar 2012 16:25:55 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 24883 invoked by uid 99); 23 Mar 2012 16:25:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2012 16:25:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2012 16:25:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 23FDE3423E7 for ; Fri, 23 Mar 2012 16:25:30 +0000 (UTC) Date: Fri, 23 Mar 2012 16:25:30 +0000 (UTC) From: "Jessica J (Commented) (JIRA)" To: mesos-dev@incubator.apache.org Message-ID: <1272640417.8625.1332519930149.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <60617292.13688.1331743841112.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MESOS-165) Slaves die after initial registration with master with "Network is unreachable" error MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MESOS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236738#comment-13236738 ] Jessica J commented on MESOS-165: --------------------------------- We may be misunderstanding one another... I'm talking using a full cluster, not just a single machine. If I set LIBPROCESS_IP in mesos-env.sh to the internal IP address of my master and copy that same file (including the LIBPROCESS_IP setting) to my slaves, the slaves fail to start because the IP address defined by LIBPROCESS_IP does not belong to them. (The error is "cannot assign requested address.") > Slaves die after initial registration with master with "Network is unreachable" error > ------------------------------------------------------------------------------------- > > Key: MESOS-165 > URL: https://issues.apache.org/jira/browse/MESOS-165 > Project: Mesos > Issue Type: Bug > Components: master, slave > Environment: Scientific Linux 6.2 internal cluster > Reporter: Jessica J > Assignee: Charles Reiss > Priority: Blocker > > I am using a cluster in which only the master is externally accessible, so when I start the master, I set --ip to one of its internal IP addresses so that it can communicate with its slaves. I have also tried setting this ip address in mesos-env.sh (in the deploy directory) by setting LIBPROCESS_IP, but each time the master starts, it says that it is running at the external IP address (as if it is ignoring the --ip or LIBPROCESS_IP options). > When I start a slave, I tell it that the master is at an internal IP address (no matter what the master says it's running at), so the initial connection is successful. (I get messages output from both the slave and the master saying the connection was successful.) However, after registering, the slave *immediately* dies. My guess is that upon successful connection, the master tells the slave to communicate with it on the external IP address, but since the slave has no access to the Internet, any further communication fails. > The following is the error message the slave gives when it dies: > F0314 12:25:45.196940 13406 process.cpp:1576] Failed to link, connect: Network is unreachable [101] > *** Check failure stack trace: *** > @ 0x7f7d6be3342d google::LogMessage::Fail() > @ 0x7f7d6be36ae7 google::LogMessage::SendToLog() > @ 0x7f7d6be36066 google::LogMessage::Flush() > @ 0x7f7d6be36279 google::LogMessage::~LogMessage() > @ 0x7f7d6be39351 google::ErrnoLogMessage::~ErrnoLogMessage() > @ 0x7f7d6be47319 process::SocketManager::link() > @ 0x7f7d6be4bc88 process::ProcessManager::link() > @ 0x7f7d6be4ed98 process::ProcessBase::link() > @ 0x7f7d6bcaf575 mesos::internal::slave::Slave::newMasterDetected() > @ 0x7f7d6bcbbd7f ProtobufProcess<>::handler1<>() > @ 0x7f7d6bcbe477 ProtobufProcess<>::visit() > @ 0x7f7d6be504e0 process::MessageEvent::visit() > @ 0x7f7d6be4b448 process::ProcessManager::resume() > @ 0x7f7d6be43bae process::schedule() > @ 0x7f7d6b5a77f1 start_thread > @ 0x7f7d6a93c92d clone > Aborted > I have looked at the code (master.cpp, process.cpp, main.cpp, slave.cpp, mesos-master.sh, etc.) and tried to determine why the ip option is getting ignored, but I have thus far been unsuccessful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira