Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDE5D18F70 for ; Mon, 25 Jan 2016 22:21:28 +0000 (UTC) Received: (qmail 96748 invoked by uid 500); 25 Jan 2016 22:21:25 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 96605 invoked by uid 500); 25 Jan 2016 22:21:25 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 96595 invoked by uid 99); 25 Jan 2016 22:21:25 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2016 22:21:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 215E01A03B2 for ; Mon, 25 Jan 2016 21:57:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yvaTN95LZmKq for ; Mon, 25 Jan 2016 21:57:25 +0000 (UTC) Received: from mail-io0-f174.google.com (mail-io0-f174.google.com [209.85.223.174]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 7D44D20534 for ; Mon, 25 Jan 2016 21:57:25 +0000 (UTC) Received: by mail-io0-f174.google.com with SMTP id q21so167038055iod.0 for ; Mon, 25 Jan 2016 13:57:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=fyERUp7tcQXR8RMO/j69m4kdPAyJLMsBpSn2WlFRtKY=; b=Gx3vz+BtLI/rkMTlrgPjNSPi8ATLyLFtea9R7+IwrnWxI4zPPxEfKe5/7KmCkzpWo8 4ml3xvrUJqQ9bQacsItYgjjWNbKAuE7P6iH26jsmiSTWb37pkuUxaAHhcd6W58b2DVUn AdorwQBc3JvF0gsm2LhbnRAL57pSFjOjJF+B/B+Ed7RKMPc0RRiclE/3IbSZatpW3Qmq yATvp8H9EOPKtCV67tFNfoi/J0eYfu5Q0P+3YifKj1ZP/ncWzYXSnsiPoExG0nJ01bAf HG5jltNEpXgM8t/GBUb1YMM187LUB741Xx2LV/ExNu0XCUO8DKYXLAqVFH8O8fxnXg4e dyZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=fyERUp7tcQXR8RMO/j69m4kdPAyJLMsBpSn2WlFRtKY=; b=Yr/LjZ6sLdyJgGJT7z717UswANRShc5n58ZWSj5CEhC78obkLUfo87F+5jpyab392O pQGvlxJksQxVJpQv8mcLSi2wktgI+GOHsntU6GIu2UXy/VNGfEvLKxvANZ33IhrYPcsn P/lvEKPLi3Q+I6o/8kb26P7HdhgTLY7ooeixS9DTdWMl7pGwFd4JtTKFuzuhfeek1OGC Oi/tf9WswOfY8RrEo9oYmheeOYqUq0u3fI4Ed3KP7+r2iX2BC8VacxC+zYZsY0HHQCmb jDu+a9Z2ePM+fS5e4hus2LBsb0plbAmRhySHgH6RiBfNRLZsOU+H3O0XIC2BkUoGD10b zkUg== X-Gm-Message-State: AG10YOR0kZnOz5roWGEH5cv3PAB0uepiHdI9r9rvfQ30lJcOq8LPmtOLSUDIH0/ATbbVIUcSysL/a7EtJWQFag== MIME-Version: 1.0 X-Received: by 10.107.3.37 with SMTP id 37mr21901227iod.182.1453759044896; Mon, 25 Jan 2016 13:57:24 -0800 (PST) Received: by 10.79.109.73 with HTTP; Mon, 25 Jan 2016 13:57:24 -0800 (PST) Date: Mon, 25 Jan 2016 13:57:24 -0800 Message-ID: Subject: Standalone scheduler issue - one job occupies the whole cluster somehow From: Mikhail Strebkov To: user@spark.apache.org Content-Type: multipart/alternative; boundary=001a113fb89475d373052a2fa7b1 --001a113fb89475d373052a2fa7b1 Content-Type: text/plain; charset=UTF-8 Hi all, Recently we started having issues with one of our background processing scripts which we run on Spark. The cluster runs only two jobs. One job runs for days, and another is usually like a couple of hours. Both jobs have a crob schedule. The cluster is small, just 2 slaves, 24 cores, 25.4 GB of memory. Each job takes 6 cores and 6 GB per worker. So when both jobs are running it's 12 cores out of 24 cores and 24 GB out of 25.4 GB. But sometimes I see this: https://www.dropbox.com/s/6uad4hrchqpihp4/Screen%20Shot%202016-01-25%20at%201.16.19%20PM.png So basically the long running job somehow occupied the whole cluster and the fast one can't make any progress because the cluster doesn't have resources. That's what I see in the logs: 16/01/25 21:26:48 WARN TaskSchedulerImpl: Initial job has not accepted any > resources; check your cluster UI to ensure that workers are registered and > have sufficient resources When I log in to the slaves I see this: slave 1: > /usr/lib/jvm/java/bin/java -cp -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 450 --hostname 10.191.4.151 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.191.4.151:53144/user/Worker > /usr/lib/jvm/java/bin/java -cp -cp -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 451 --hostname 10.191.4.151 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.191.4.151:53144/user/Worker slave 2: > /usr/lib/jvm/java/bin/java -cp -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 1 --hostname 10.253.142.59 *--cores 3 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.253.142.59:33265/user/Worker > /usr/lib/jvm/java/bin/java -cp -cp -Xms6144M -Xmx6144M > -Dspark.driver.port=42548 -Drun.mode=production -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler > --executor-id 448 --hostname 10.253.142.59 *--cores 1 --app-id > app-20160124152439-1468* --worker-url akka.tcp:// > sparkWorker@10.253.142.59:33265/user/Worker so somehow Spark created 4 executors, 2 on each machine, 1 core + 1 core and 3 cores + 1 core to get the total of 6 cores. But because 6 GB setting is per executor, it ends up occupying 24 GB instead of 12 GB (2 executors, 3 cores + 3 cores) and blocks the other Spark job. My wild guess is that for some reason 1 executor of the long job failed, so the job becomes 3 cores short and asks the scheduler if it can get 3 more cores, then the scheduler distributes it evenly across the slaves: 2 cores + 1 core but this distribution doesn't work until the short job finishes (because the shor job holds the rest of the memory). This explains 3 + 1 on one slave but doesn't explain 1 + 1 on another. Did anyone experience anything similar to this? Any ideas how to avoid it? Thanks, Mikhail --001a113fb89475d373052a2fa7b1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

Recently we started having issu= es with one of our background processing scripts which we run on Spark. The= cluster runs only two jobs. One job runs for days, and another is usually = like a couple of hours. Both jobs have a crob schedule. The cluster is smal= l, just 2 slaves, 24 cores, 25.4 GB of memory. Each job takes 6 cores and 6= GB per worker. So when both jobs are running it's 12 cores out of 24 c= ores and 24 GB out of 25.4 GB. But sometimes I see this:


So basically the long running job somehow occupied the whole clust= er and the fast one can't make any progress because the cluster doesn&#= 39;t have resources. That's what I see in the logs:

16/01/25 21:26:48 WAR= N TaskSchedulerImpl: Initial job has not accepted any resources; check your= cluster UI to ensure that workers are registered and have sufficient resou= rces

When I log in to the slaves I s= ee this:

slave 1:
/usr/lib/jvm/java/bin/java -cp <some_jars> = -Xms6144M -Xmx6144M -Dspark.driver.port=3D42548 -Drun.mode=3Dproduction -XX= :MaxPermSize=3D256m org.apache.spark.executor.CoarseGrainedExecutorBackend = --driver-url akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGraine= dScheduler --executor-id 450 --hostname 10.191.4.151 --cores 1 --app= -id app-20160124152439-1468 --worker-url akka.tcp://sparkWorker@10.191.4.151:53144/u= ser/Worker
/usr/lib/jvm/j= ava/bin/java -cp=C2=A0-cp <some_jars>=C2=A0-Xms6144M -Xmx6144M -Dspar= k.driver.port=3D42548 -Drun.mode=3Dproduction -XX:MaxPermSize=3D256m org.ap= ache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler --executor-= id 451 --hostname 10.191.4.151 --cores 1 --app-id app-20160124152439-146= 8 --worker-url akka.tcp://sparkWorker@10.191.4.151:53144/user/Worker

slave 2:
/usr/lib/jvm/java/bin/java -cp <some_jars>= =C2=A0-Xms6144M -Xmx6144M -Dspark.driver.port=3D42548 -Drun.mode=3Dproducti= on -XX:MaxPermSize=3D256m org.apache.spark.executor.CoarseGrainedExecutorBa= ckend --driver-url akka.tcp://sparkDriver@10.233.17.48:42548/user/Coarse= GrainedScheduler --executor-id 1 --hostname 10.253.142.59 --cores 3 = --app-id app-20160124152439-1468 --worker-url akka.tcp://sparkWorker@10.253.142.59:= 33265/user/Worker
/usr/li= b/jvm/java/bin/java -cp=C2=A0-cp <some_jars>=C2=A0-Xms6144M -Xmx6144M= -Dspark.driver.port=3D42548 -Drun.mode=3Dproduction -XX:MaxPermSize=3D256m= org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.t= cp://sparkDriver@10.233.17.48:42548/user/CoarseGrainedScheduler --ex= ecutor-id 448 --hostname 10.253.142.59 --cores 1 --app-id app-2016012415= 2439-1468 --worker-url akka.tcp://sparkWorker@10.253.142.59:33265/user/Worker

so somehow Spark created 4 executors,= 2 on each machine, 1 core + 1 core and 3 cores + 1 core to get the total o= f 6 cores. But because 6 GB setting is per executor, it ends up occupying 2= 4 GB instead of 12 GB (2 executors, 3 cores + 3 cores) and blocks the other= Spark job.

My wild guess is that for some r= eason 1 executor of the long job failed, so the job becomes 3 cores short a= nd asks the scheduler if it can get 3 more cores, then the scheduler distri= butes it evenly across the slaves: 2 cores + 1 core but this distribution d= oesn't work until the short job finishes (because the shor job holds th= e rest of the memory). This explains 3 + 1 on one slave but doesn't exp= lain 1 + 1 on another.

Did anyone experience anyth= ing similar to this? Any ideas how to avoid it?

Th= anks,
Mikhail
--001a113fb89475d373052a2fa7b1--