Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0BB7E2009C5 for ; Mon, 16 May 2016 08:45:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0A36C160A16; Mon, 16 May 2016 06:45:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0672E1609B0 for ; Mon, 16 May 2016 08:45:51 +0200 (CEST) Received: (qmail 45629 invoked by uid 500); 16 May 2016 06:45:50 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 45619 invoked by uid 99); 16 May 2016 06:45:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2016 06:45:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 956711A09CD for ; Mon, 16 May 2016 06:45:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1bKHTMGHNdqq for ; Mon, 16 May 2016 06:45:47 +0000 (UTC) Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com [209.85.220.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 99E135F239 for ; Mon, 16 May 2016 06:45:46 +0000 (UTC) Received: by mail-qk0-f179.google.com with SMTP id r184so91637574qkc.1 for ; Sun, 15 May 2016 23:45:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=nFYRZCspQ5K0TtY6JUVNzu7aEQmRoVoXxTYZxQ0H5iw=; b=nsJL+H6GJ4QCQC2VEf1f9B53MnHUAqkgTM29HH29wiNgNA9TMvLyHiH/Dv4N+qTDDs 299S6quKxqNcBsJB2YRQR5B6pP2xqV0Bgd+t00qv//7Se3XujqVVDiDIbY/etPhDCpcH OfQBGhdEhgOyNVjHJWiZJkUbya1tWJ8FJa9JsKSIecsuzcuFO5IsPLRBTU9htfWO/IKz uOvwPsgtgwt/GDQnzKQd9zK5q/nZvHdhb4cec+DL5QrQepbgn/0cndFug5cHf71rS6sH 4XjUl2Z8RUcnn2zpFCt4elXkIy+Ey06t2BTgdYz8x9C4f5VRT2ESQS7UGL7jZvtDz8dL frfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=nFYRZCspQ5K0TtY6JUVNzu7aEQmRoVoXxTYZxQ0H5iw=; b=hMG+gc6f9kxSWKvh9eY8icy+7CISgxBhoh8IAy/+ClHXwkB5DUqowYSA10Ebvlq7jI Y7cSR0F344wYJh4cAsJ4Mdzg5vMFDz7adNh4wkJbuEXgUV1wL7HE4BBL69yugIW3NfSx m51aBmxGW0uNN+hTmoAz4tyFqoDfeTM/lo5OfEj28v5IoxU+fxbKxYfaPOmYKSl6PM81 pnlMUDRIQw0MaDqaBZKh22xaL6Y06liKciwSl0BmQHUdZxHYLB1InFIdxbjRnJamVlbx mEwjnN6AjBhflxXcPOo7BtGfEnCf9m6QPs4tq6oqN98KyJdebTLo7f/I6nR4wuLgQHLA WYtw== X-Gm-Message-State: AOPr4FWLzsHGBwph+KqbRjv+tIaD+jbm/4RFjc7Pdrh7yEAm6B228JM0ZGgi6sjqwqeuRKeVnWqMdslYJwMUFw== MIME-Version: 1.0 X-Received: by 10.55.20.167 with SMTP id 39mr22385711qku.197.1463381145781; Sun, 15 May 2016 23:45:45 -0700 (PDT) Received: by 10.55.70.69 with HTTP; Sun, 15 May 2016 23:45:45 -0700 (PDT) In-Reply-To: <6DB3DE1D-359E-4230-8B4B-B318FAC01118@mail.com> References: <6DB3DE1D-359E-4230-8B4B-B318FAC01118@mail.com> Date: Mon, 16 May 2016 07:45:45 +0100 Message-ID: Subject: Re: Executors and Cores From: Mich Talebzadeh To: "Mail.com" , "user @spark" Content-Type: multipart/alternative; boundary=001a113ad6b45d8e190532eff91c archived-at: Mon, 16 May 2016 06:45:53 -0000 --001a113ad6b45d8e190532eff91c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Pradeep, Resources allocated for each Spark app can be capped to allow a balanced resourcing for all apps. However, you really need to monitor each app. One option would be to use jmonitor package to look at resource usage (heap, CPU, memory etc) for each job. In general you should not allocate too much for each job and FIFO is the default scheduling. If you are allocating resources then you need to cap it ${SPARK_HOME}/bin/spark-submit \ --master local[2] \ --driver-memory 4g \ --num-executors=3D1 \ --executor-memory=3D4G \ --executor-cores=3D2 \ =E2=80=A6.. Don't over allocate resources as they will be wasted. Spark GUI on 4040 can be useful but only displays the FIFO job picked up so you wont see other jobs until the JVM that using Port 4040 is completed or killed. Start by identify Spark Jobs through jps. They will show up as SparkSubmit HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6= zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 15 May 2016 at 13:19, Mail.com wrote: > Hi , > > I have seen multiple videos on spark tuning which shows how to determine = # > cores, #executors and memory size of the job. > > In all that I have seen, it seems each job has to be given the max > resources allowed in the cluster. > > How do we factor in input size as well? I am processing a 1gb compressed > file then I can live with say 10 executors and not 21 etc.. > > Also do we consider other jobs in the cluster that could be running? I > will use only 20 GB out of available 300 gb etc.. > > Thanks, > Pradeep > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org > For additional commands, e-mail: user-help@spark.apache.org > > --001a113ad6b45d8e190532eff91c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Pradeep,

Resources alloca= ted for each Spark app can be capped to allow a balanced resourcing for all= apps. However, you really need to monitor each app.

One option would be to use jmonitor package to look at resource usage (h= eap, CPU, memory etc) for each job.

In general you= should not allocate too much for each job and FIFO is the default scheduli= ng.

If you are allocating resources then you need= to cap it

${SPARK_HOME}/bin/spark-su= bmit \

=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 --master= local[2] \

=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 --driver= -memory 4g \

=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 --num-ex= ecutors=3D1 \

=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 --execut= or-memory=3D4G \

=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 --executor-cores=3D= 2 \

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 =E2=80=A6..

Don't over allocate resources as th= ey will be wasted.


Spark GUI on 4040 can be useful but only displays the FIF= O job picked up so you wont see other jobs until the JVM that using Port 40= 40 is completed or killed.


Start by identify Spark Jobs through jps. They wil= l show up as SparkSubmit

=

HTH




On 15 May 2016 at 13:19, Mail.com <pr= adeep.misra@mail.com> wrote:
user-help@spark.apache.org


--001a113ad6b45d8e190532eff91c--