Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4790D200C56 for ; Fri, 31 Mar 2017 05:33:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 45FC4160B98; Fri, 31 Mar 2017 03:33:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8D9F4160B8B for ; Fri, 31 Mar 2017 05:33:47 +0200 (CEST) Received: (qmail 55024 invoked by uid 500); 31 Mar 2017 03:33:45 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 54979 invoked by uid 99); 31 Mar 2017 03:33:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Mar 2017 03:33:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 13B81C11FD; Fri, 31 Mar 2017 03:33:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id XMwixYUBjjre; Fri, 31 Mar 2017 03:33:44 +0000 (UTC) Received: from mail-lf0-f43.google.com (mail-lf0-f43.google.com [209.85.215.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id A5D605FAD1; Fri, 31 Mar 2017 03:33:43 +0000 (UTC) Received: by mail-lf0-f43.google.com with SMTP id z15so37222025lfd.1; Thu, 30 Mar 2017 20:33:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=fh7n9zzhipDkb2zkE3YGQiDly76XgA8X4PewM+1gM3M=; b=FUaHZ8/Cs7o69qBJwyn+k5UaaJ5+GQisNdgpK36SL3cxs923xWlfUn31KqoBJy7zdC qS+ep7Dy0O0lIfDkUpeOzHMKRB+rl1olm77lYwBE6FP2x9gm3zzlDO0g0iviQPs0sANQ vt3201yfVmPalgGQkDQuvWx8Q8C+9rqJEBSPPCWBk7fPqPR7FSg73JQ33AM1Eiap8Oir J6j7A10LSB4b95FeH6KEGWF6PO0NXK8Yi7YTNITtpxMRCNWr0NBUdlFKg9TBZdQlUlVr JLeUF1eqf1aTTqt41Cc5Aogq9mkRmEZnHzhJTkb51y0U8rIj5mSA4ttj+LYxyqZdZ+AM +5UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=fh7n9zzhipDkb2zkE3YGQiDly76XgA8X4PewM+1gM3M=; b=cuBARBz1ETgACxWVAU/cMzlVitpMxKmgwzCxkTUK9uP2r/sTiYhQA1GvtyVCBd1Ua3 chJ2gDbYnS2OaVWZggbIrsp2nM15lTYclZjPhUsSsWn8KH6cCNLC3qrbONr3jbZ5Iaf0 qHIvAERO/psFf/OUFnUjqSLUUEMIBGh7yBaaUqbWpHe0dL4+O0/Obx/ImNkmzZ0hPq6Y eBznbVY0DMYkLd4Q9/cX2OKEUjMbWZ/XlW/YwmMXBlpnaHDLMWAbg2SzhoTP/H7execu u47ee+SxI6q2lYBUw/h4vv5JKXzVsYgTs8vw1ZWurodScxZVFnlQCdZCKnBBiKlSbhQb idKA== X-Gm-Message-State: AFeK/H0GJAQX/x9mcPS+8u/D8FwYphqgNAglEqrhbXuxpcIaoyf/t+48YQVTttobZUHsQC7ualBGfy9NkuF22w== X-Received: by 10.25.229.6 with SMTP id c6mr177907lfh.95.1490931223087; Thu, 30 Mar 2017 20:33:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.92.85 with HTTP; Thu, 30 Mar 2017 20:33:42 -0700 (PDT) In-Reply-To: References: <6BAA2433-37C0-4724-9659-0643495FE3CB@gmail.com> From: Timothy Chen Date: Thu, 30 Mar 2017 20:33:42 -0700 Message-ID: Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted To: Yu Wei Cc: dev , "users@spark.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Fri, 31 Mar 2017 03:33:48 -0000 Hi Yu, As mentioned earlier, currently the Spark framework will not re-register as the failover_timeout is not set and there is no configuration available yet. It's only enabled in MesosClusterScheduler since it's meant to be a HA framework. We should add that configuration for users that want their Spark frameworks to be able to failover in case of Master failover or network disconnect, etc. Tim On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei wrote: > Hi Tim, > > I tested the scenario again with settings as below, > > [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf > spark.deploy.recoveryMode ZOOKEEPER > spark.deploy.zookeeper.url 192.168.111.53:2181 > spark.deploy.zookeeper.dir /spark > spark.executor.memory 512M > spark.mesos.principal agent-dev-1 > > > However, the case still failed. After master restarted, spark framework d= id > not re-register. > From spark framework log, it seemed that below method in > MesosClusterScheduler was not called. > override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo= ): > Unit > > Did I miss something? Any advice? > > > Thanks, > > Jared, (=E9=9F=A6=E7=85=9C=EF=BC=89 > Software developer > Interested in open source software, big data, Linux > > > > ________________________________ > From: Timothy Chen > Sent: Friday, March 31, 2017 5:13 AM > To: Yu Wei > Cc: users@spark.apache.org; dev > Subject: Re: [Spark on mesos] Spark framework not re-registered and lost > after mesos master restarted > > I think failover isn't enabled on regular Spark job framework, since we > assume jobs are more ephemeral. > > It could be a good setting to add to the Spark framework to enable failov= er. > > Tim > > On Mar 30, 2017, at 10:18 AM, Yu Wei wrote: > > Hi guys, > > I encountered a problem about spark on mesos. > > I setup mesos cluster and launched spark framework on mesos successfully. > > Then mesos master was killed and started again. > > However, spark framework couldn't be re-registered again as mesos agent > does. I also couldn't find any error logs. > > And MesosClusterDispatcher is still running there. > > > I suspect this is spark framework issue. > > What's your opinion? > > > > Thanks, > > Jared, (=E9=9F=A6=E7=85=9C=EF=BC=89 > Software developer > Interested in open source software, big data, Linux --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscribe@spark.apache.org