Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2D1F17980 for ; Mon, 23 Mar 2015 21:52:23 +0000 (UTC) Received: (qmail 80530 invoked by uid 500); 23 Mar 2015 21:52:18 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 80438 invoked by uid 500); 23 Mar 2015 21:52:18 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 80426 invoked by uid 99); 23 Mar 2015 21:52:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 21:52:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of manojsameltech@gmail.com designates 209.85.212.180 as permitted sender) Received: from [209.85.212.180] (HELO mail-wi0-f180.google.com) (209.85.212.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Mar 2015 21:51:53 +0000 Received: by wibg7 with SMTP id g7so42311059wib.1 for ; Mon, 23 Mar 2015 14:51:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=sRrxCihWKANXltK1ko8l5gav+CC5lu3s0jRzAgZbOhI=; b=0dNQ8cK76BOSS3mRchvbHYquVmzfAHUY9Jsmc1+TVN6MRiKBRwa5q07YODhI4wMOYl RWDh1fVfzTz5jh3ouuxKnuEc3CXcBicRflCFFFiqS/T/h8WaMzXtTjrxo9DowTky8ana i5mYrvqwLHU84d3NswAZG8OVpdmME8U0dELxvPu6mEVA4VfEl2ZBc8HcqWsbQNouk8Kh WK79zKFyHwz+5GNqb3tHMwANW7gUO91o9zlz0O6jBzD9FzPAoNEMTlcKboF+sFD8eYYE fLPUmugy+G81nmAqHktYsmr+o2l2ADC8jah11mWjdvxjGrIGsztUHb/HkYsMRsR3zTc5 rKkw== MIME-Version: 1.0 X-Received: by 10.180.78.136 with SMTP id b8mr23408101wix.6.1427147511845; Mon, 23 Mar 2015 14:51:51 -0700 (PDT) Received: by 10.180.73.80 with HTTP; Mon, 23 Mar 2015 14:51:51 -0700 (PDT) In-Reply-To: References: Date: Mon, 23 Mar 2015 14:51:51 -0700 Message-ID: Subject: Re: Spark 1.3 Dynamic Allocation - Requesting 0 new executor(s) because tasks are backlogged From: Manoj Samel To: Marcelo Vanzin Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=f46d0435c0347c5a460511fbacc2 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0435c0347c5a460511fbacc2 Content-Type: text/plain; charset=UTF-8 Log shows stack traces that seem to match the assert in JIRA so it seems I am hitting the issue. Thanks for the heads up ... 15/03/23 20:29:50 ERROR actor.OneForOneStrategy: assertion failed: Allocator killed more executors than are allocated! java.lang.AssertionError: assertion failed: Allocator killed more executors than are allocated! at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.yarn.YarnAllocator.killExecutor(YarnAllocator.scala:152) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMaster.scala:547) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1.applyOrElse(ApplicationMaster.scala:547) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor.aroundReceive(ApplicationMaster.scala:506) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) On Mon, Mar 23, 2015 at 2:25 PM, Marcelo Vanzin wrote: > On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel > wrote: > > Found the issue above error - the setting for spark_shuffle was > incomplete. > > > > Now it is able to ask and get additional executors. The issue is once > they > > are released, it is not able to proceed with next query. > > That looks like SPARK-6325, which unfortunately was not fixed in time > for 1.3.0... > > -- > Marcelo > --f46d0435c0347c5a460511fbacc2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Log shows stack traces that seem to match the assert in JI= RA so it seems I am hitting the issue. Thanks for the heads up ...

=
15/03/23 20:29:50 ERROR actor.OneForOneStrategy: assertion = failed: Allocator killed more executors than are allocated!
java.= lang.AssertionError: assertion failed: Allocator killed more executors than= are allocated!
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.Predef$.asse= rt(Predef.scala:179)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.sp= ark.deploy.yarn.YarnAllocator.killExecutor(YarnAllocator.scala:152)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.deploy.yarn.ApplicationM= aster$AMActor$$anonfun$receive$1$$anonfun$applyOrElse$6.apply(ApplicationMa= ster.scala:547)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.d= eploy.yarn.ApplicationMaster$AMActor$$anonfun$receive$1$$anonfun$applyOrEls= e$6.apply(ApplicationMaster.scala:547)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray= .scala:59)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.mutabl= e.ArrayBuffer.foreach(ArrayBuffer.scala:47)
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at org.apache.spark.deploy.yarn.ApplicationMaster$AMActor$$anonfun$r= eceive$1.applyOrElse(ApplicationMaster.scala:547)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.deploy.yarn.Applicati= onMaster$AMActor.aroundReceive(ApplicationMaster.scala:506)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.ActorCell.receiveMessage(ActorCell.s= cala:516)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.ActorCell.inv= oke(ActorCell.scala:487)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.disp= atch.Mailbox.processMailbox(Mailbox.scala:238)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.ForkJoinExecutorConfigurator$AkkaFor= kJoinTask.exec(AbstractDispatcher.scala:393)
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:= 260)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoin.For= kJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoin= Pool.java:1979)

On Mon, Mar 23, 2015 at 2:25 PM, Marcelo Vanzin <vanzi= n@cloudera.com> wrote:
On Mon, Mar 23, 2015 at 2:15 PM, Manoj Samel <manojsameltech@gmail.com> wrote:
> Found the issue above error - the setting for spark_shuffle was incomp= lete.
>
> Now it is able to ask and get additional executors. The issue is once = they
> are released, it is not able to proceed with next query.

That looks like SPARK-6325, which unfortunately was not fixed in tim= e
for 1.3.0...

--
Marcelo

--f46d0435c0347c5a460511fbacc2--