Return-Path: X-Original-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-mesos-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BAB0FF64F for ; Wed, 17 Apr 2013 19:30:59 +0000 (UTC) Received: (qmail 5526 invoked by uid 500); 17 Apr 2013 19:30:59 -0000 Delivered-To: apmail-incubator-mesos-dev-archive@incubator.apache.org Received: (qmail 5503 invoked by uid 500); 17 Apr 2013 19:30:59 -0000 Mailing-List: contact mesos-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mesos-dev@incubator.apache.org Delivered-To: mailing list mesos-dev@incubator.apache.org Received: (qmail 5493 invoked by uid 99); 17 Apr 2013 19:30:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Apr 2013 19:30:59 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vinod@twitter.com designates 209.85.160.54 as permitted sender) Received: from [209.85.160.54] (HELO mail-pb0-f54.google.com) (209.85.160.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Apr 2013 19:30:55 +0000 Received: by mail-pb0-f54.google.com with SMTP id xa7so1033684pbc.27 for ; Wed, 17 Apr 2013 12:30:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com; s=google; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type; bh=iYhF6QARU3p9OvQ1TTvWV+yRnDPfcHdsJGjDdP4bP0Y=; b=NTW8JP71HDF0rHeEOoL4lax9j4Nl/D/MizgWiPz4SEoP4JWB1XPIBl0nSz4GJCYo90 MY+arQb5FFKlxmADo5AIN2syxe7hR5EE3Z4MPVKVITkRDips86NOKuOmH3VeCeTVLm3y pu8kxP8LOqFmakfjWsLv6Yb9Ki02ptegG0NnA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type; bh=iYhF6QARU3p9OvQ1TTvWV+yRnDPfcHdsJGjDdP4bP0Y=; b=f45gzpukOuhYEqz/VnUrWLnatPvDkNNch8Wf3dSl+WeehwSWmUQ4W74RqyrFWQA0cg 8pRM9wYyVvEuKOtkQ3qCuMOOol9m24QjANErTYTqrj+N65WotCVRmPrxgXnSrlaS75rq LCiip71YqvKyRcPslBWUwX3EH3wnIA4M9DKABB27QqiQDLSOYqNknxNkb0n/3XRXm7c4 x5RA1eJQvvK292ryiVzXoGAZNax9Wzbk8883Ifo8OPLHbbxYRglfXj7bR9i4RX7lG+yo eFDp0LZop9BBDdGeUxF1nF/VvEI9UQuGfdolinz94DwsNoNsTWyHvPOfhlJubZNO9I5u fp2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :x-gm-message-state; bh=iYhF6QARU3p9OvQ1TTvWV+yRnDPfcHdsJGjDdP4bP0Y=; b=XziLxixKC752rZsiz7f6kepz/UFQPj0NQo7GjTXc20MLFn441ROO3Jq3TP1xIRKJJ1 i354CVPTDKhv6Z3PXgT8ugkbiJ0zHA/GczetWXAfP4tEPDFO8nV6cGWa1fR2k1BQzMXu d2QEg83Fu3Z0QWy/gnvKyKj+G1uCAxdahyzcPylnR20Rjfzk/XuApRnzdsL02l27Xx8U B4A1WglDpdSJ+r9Fxc7Yyq7iob+FBK0YSZJkFyOso7q/jvcvO87xfye7bW75eNTYmDJQ VqdctzIeywUS7QtP7JY4lheevm2shs1+TcrkE+ur3fviZBwXbUV7CjQ3U853m/4qNs4B 276Q== X-Received: by 10.66.156.196 with SMTP id wg4mr10365996pab.23.1366227034949; Wed, 17 Apr 2013 12:30:34 -0700 (PDT) MIME-Version: 1.0 Sender: vinod@twitter.com Received: by 10.68.247.37 with HTTP; Wed, 17 Apr 2013 12:30:14 -0700 (PDT) In-Reply-To: References: From: Vinod Kone Date: Wed, 17 Apr 2013 12:30:14 -0700 X-Google-Sender-Auth: 1FrAy3RACs7edkcwK7tij9LLI3Q Message-ID: Subject: Re: Mesos master keeps adding and removing slave or just segfaults To: "mesos-dev@incubator.apache.org" Content-Type: multipart/alternative; boundary=047d7b86f12019d82404da9385a4 X-Gm-Message-State: ALoCoQlsZtwFoBVxJA4/Lq6n3nJTrZdRtHvDLL5CHSxtj7frFPyjYdbKm+An8JhI/cunljuj/9rp X-Virus-Checked: Checked by ClamAV on apache.org --047d7b86f12019d82404da9385a4 Content-Type: text/plain; charset=ISO-8859-1 Also can you try starting the slave with manually specifying the slave's ip address via "--ip" flag? That seemed to have done the trick for other people on this list in similar situation. On Wed, Apr 17, 2013 at 12:26 PM, Benjamin Mahler wrote: > In terms of the connectivity issue, can you re-run with GLOG_v=2 and report > back? > > > On Tue, Apr 16, 2013 at 6:41 PM, Vinod Kone wrote: > > > On Tue, Apr 16, 2013 at 6:41 PM, Vinod Kone wrote: > > > > > Hi John, > > > > > > You seem to have hit a couple of known issues: > > > https://issues.apache.org/jira/browse/MESOS-300 > > > https://issues.apache.org/jira/browse/MESOS-435 > > > > > > Unfortunately, we haven't been able to reproduce these bugs > consistently > > > on our end, so we were never able to find the root cause and fix :/ > > Please > > > add your data to the above tickets, so that we can diagnose/fix these. > > > > > > > > > > > > > > > @vinodkone > > > > > > > > > On Tue, Apr 16, 2013 at 6:21 AM, John B. Wyatt IV > >wrote: > > > > > >> Greetings, > > >> > > >> I've been spending some time trying to get the Mesos up and running on > > >> Vagrant (a nice frontend for headless Virtualbox). I have the master > > setup > > >> locally on 33.33.13.38:5050 and one slave setup on 33.33.13.39:5050. > > >> There > > >> able to communicate with each other and the web display on the master > > >> works. The problem is that the master keeps adding and removing the > > slave > > >> or just segfaults sometimes. The web interface doesn't register the > > slave > > >> (maybe removed too quickly?). I'm not too sure what to do at this > point > > >> and > > >> I was hoping for some help. I'm using Mesos 0.10. > > >> > > >> Here is the output from the master: > > >> > > >> I0416 10:09:01.794397 2040 dominant_share_allocator.cpp:417] > Performed > > >> allocation for 0 slaves in 0.018916 milliseconds > > >> I0416 10:09:02.099568 2038 master.cpp:906] Attempting to register > slave > > >> on > > >> vagrant-ubuntu.vagrantup.com at slave(1)@127.0.1.1:57599 > > >> I0416 10:09:02.100764 2038 master.cpp:1142] Master now considering a > > >> slave > > >> at vagrant-ubuntu.vagrantup.com:57599 as active > > >> I0416 10:09:02.101080 2038 master.cpp:1721] Adding slave > > >> 201304161008-16842879-5050-2023-56 at vagrant-ubuntu.vagrantup.comwith > > >> cpus=2; mem=979; ports=[31000-32000] > > >> I0416 10:09:02.104706 2038 master.cpp:513] Slave > > >> 201304161008-16842879-5050-2023-56(vagrant-ubuntu.vagrantup.com) > > >> disconnected > > >> I0416 10:09:02.105237 2037 dominant_share_allocator.cpp:244] Added > > slave > > >> 201304161008-16842879-5050-2023-56 (vagrant-ubuntu.vagrantup.com) > with > > >> cpus=2; mem=979; ports=[31000-32000] (and cpus=2; mem=979; > > >> ports=[31000-32000] available) > > >> I0416 10:09:02.105865 2037 dominant_share_allocator.cpp:435] > Performed > > >> allocation for slave 201304161008-16842879-5050-2023-56 in 0.011817 > > >> milliseconds > > >> I0416 10:09:02.106258 2037 dominant_share_allocator.cpp:269] Removed > > >> slave > > >> 201304161008-16842879-5050-2023-56 > > >> I0416 10:09:02.797294 2038 dominant_share_allocator.cpp:417] > Performed > > >> allocation for 0 slaves in 0.017615 milliseconds > > >> I0416 10:09:03.101245 2040 master.cpp:906] Attempting to register > slave > > >> on > > >> vagrant-ubuntu.vagrantup.com at slave(1)@127.0.1.1:57599 > > >> I0416 10:09:03.102088 2040 master.cpp:1142] Master now considering a > > >> slave > > >> at vagrant-ubuntu.vagrantup.com:57599 as active > > >> I0416 10:09:03.103230 2040 master.cpp:1721] Adding slave > > >> 201304161008-16842879-5050-2023-57 at vagrant-ubuntu.vagrantup.comwith > > >> cpus=2; mem=979; ports=[31000-32000] > > >> I0416 10:09:03.106045 2040 master.cpp:513] Slave > > >> 201304161008-16842879-5050-2023-57(vagrant-ubuntu.vagrantup.com) > > >> disconnected > > >> I0416 10:09:03.106202 2039 dominant_share_allocator.cpp:244] Added > > slave > > >> 201304161008-16842879-5050-2023-57 (vagrant-ubuntu.vagrantup.com) > with > > >> cpus=2; mem=979; ports=[31000-32000] (and cpus=2; mem=979; > > >> ports=[31000-32000] available) > > >> I0416 10:09:03.107240 2039 dominant_share_allocator.cpp:435] > Performed > > >> allocation for slave 201304161008-16842879-5050-2023-57 in 0.011276 > > >> milliseconds > > >> I0416 10:09:03.107650 2039 dominant_share_allocator.cpp:269] Removed > > >> slave > > >> 201304161008-16842879-5050-2023-57 > > >> I0416 10:09:03.799612 2040 dominant_share_allocator.cpp:417] > Performed > > >> allocation for 0 slaves in 0.024916 milliseconds > > >> > > >> Here is the output from the slave: > > >> I0416 10:19:46.207093 1867 main.cpp:123] Creating "process" isolation > > >> module > > >> I0416 10:19:46.209199 1867 main.cpp:131] Build: 2013-04-16 07:41:31 > by > > >> vagrant > > >> I0416 10:19:46.209410 1867 main.cpp:132] Starting Mesos slave > > >> I0416 10:19:46.210247 1883 slave.cpp:175] Slave started on 1)@ > > >> 127.0.1.1:56701 > > >> I0416 10:19:46.210842 1883 slave.cpp:176] Slave resources: cpus=2; > > >> mem=979; ports=[31000-32000] > > >> I0416 10:19:46.213693 1883 slave.cpp:352] New master detected at > > >> master@33.33.13.38:5050 > > >> Loading webui script at > > >> '/home/vagrant/mesos-0.10.0/src/webui/slave/webui.py' > > >> Bottle server starting up (using WSGIRefServer())... > > >> Listening on http://0.0.0.0:8081/ > > >> Use Ctrl-C to quit. > > >> > > >> Sometimes the master just quits > > >> > > >> master: > > >> I0416 10:19:58.244128 2545 master.cpp:513] Slave > > >> 201304161019-16842879-5050-2531-12(vagrant-ubuntu.vagrantup.com) > > >> disconnected > > >> I0416 10:19:58.245954 2545 dominant_share_allocator.cpp:269] Removed > > >> slave > > >> 201304161019-16842879-5050-2531-12 > > >> F0416 10:19:58.719403 2549 process.cpp:1828] Check failed: > > >> outgoing.count(s) > 0 > > >> *** Check failure stack trace: *** > > >> @ 0x7f554933c0ad google::LogMessage::Fail() > > >> @ 0x7f554933e83f google::LogMessage::SendToLog() > > >> @ 0x7f554933bcab google::LogMessage::Flush() > > >> @ 0x7f554933f0cd google::LogMessageFatal::~LogMessageFatal() > > >> @ 0x7f5549227484 process::SocketManager::next() > > >> @ 0x7f55492216bf process::send_data() > > >> @ 0x7f554937b9df ev_invoke_pending > > >> @ 0x7f554937fd14 ev_loop > > >> @ 0x7f554922292c process::serve() > > >> @ 0x7f5548a9ae9a start_thread > > >> @ 0x7f5547fb5cbd (unknown) > > >> > > >> > > >> Additional from slave: > > >> I0416 10:19:58.808632 1884 slave.cpp:1141] Process exited: @ > 0.0.0.0:0 > > >> W0416 10:19:58.808785 1884 slave.cpp:1144] WARNING! Master > > disconnected! > > >> Waiting for a new master to be elected. > > >> > > >> > > >> -- > > >> John > > >> > > > > > > > > > --047d7b86f12019d82404da9385a4--