Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E2E217436 for ; Wed, 7 Oct 2015 10:06:23 +0000 (UTC) Received: (qmail 61781 invoked by uid 500); 7 Oct 2015 10:06:20 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 61684 invoked by uid 500); 7 Oct 2015 10:06:20 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 61629 invoked by uid 99); 7 Oct 2015 10:06:20 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 10:06:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C26961A228B for ; Wed, 7 Oct 2015 10:06:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Yt1L2snF52qN for ; Wed, 7 Oct 2015 10:06:12 +0000 (UTC) Received: from mail-yk0-f177.google.com (mail-yk0-f177.google.com [209.85.160.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 2FDB3439B6 for ; Wed, 7 Oct 2015 10:06:12 +0000 (UTC) Received: by ykba192 with SMTP id a192so7670829ykb.3 for ; Wed, 07 Oct 2015 03:06:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=M9b+gvWq5VAA1S9LO06JuevUSTZSvW2HQe0rHE5bo5c=; b=XSk5sjUyEZMUjNa6Qm0LY4S2BBWhYHoIvhnnwmiML76/NwnDzsZ8Gu1zcPHdkKSOhB QMFqQTB05W7BpS9l77cLfxx+W/csoHRjTBOSAC7I1t4cNjfBrIsewRPIYwhV7UBdgIFW kDyUUBBzbP0ryk7CU1Ufrj/sL6YVEXFC4xwsiTNPnGYyeJUns7RF7FVZ2DlqNScwt9Zk HwOxF0+Cf1s35hYlkXkpDskBaa7yI/u85cZ/45cHO/qN8nNNRaVRsfIkXusB2sGDz2fZ xE7Qpi6hkqYTj8JQvm4d6mskGBUeYaGfMUrX+be50mULZEB4WNjkuXHtB1PCT8f9dQRG Eufg== MIME-Version: 1.0 X-Received: by 10.13.233.130 with SMTP id s124mr103925ywe.212.1444212371867; Wed, 07 Oct 2015 03:06:11 -0700 (PDT) Received: by 10.37.19.7 with HTTP; Wed, 7 Oct 2015 03:06:11 -0700 (PDT) Received: by 10.37.19.7 with HTTP; Wed, 7 Oct 2015 03:06:11 -0700 (PDT) In-Reply-To: <39659BA7-FD40-4449-B54C-EEA7F36ADEA5@hortonworks.com> References: <39659BA7-FD40-4449-B54C-EEA7F36ADEA5@hortonworks.com> Date: Wed, 7 Oct 2015 21:06:11 +1100 Message-ID: Subject: Re: spark multi tenancy From: ayan guha To: Steve Loughran Cc: user , Dominik Fries Content-Type: multipart/alternative; boundary=94eb2c07201067e702052180e5a0 --94eb2c07201067e702052180e5a0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Can queues also be used to separate workloads? On 7 Oct 2015 20:34, "Steve Loughran" wrote: > > > On 7 Oct 2015, at 09:26, Dominik Fries > wrote: > > > > Hello Folks, > > > > We want to deploy several spark projects and want to use a unique proje= ct > > user for each of them. Only the project user should start the spark > > application and have the corresponding packages installed. > > > > Furthermore a personal user, which belongs to a specific project, shoul= d > > start a spark application via the corresponding spark project user as > proxy. > > (Development) > > > > The Application is currently running with ipython / pyspark. (HDP 2.3 - > > Spark 1.3.1) > > > > Is this possible or what is the best practice for a spark multi tenancy > > environment ? > > > > > > Deploy on a kerberized YARN cluster and each application instance will be > running as a different unix user in the cluster, with the appropriate > access to HDFS =E2=80=94isolated. > > The issue then becomes "do workloads clash with each other?". If you want > to isolate dev & production, using node labels to keep dev work off the > production nodes is the standard technique. --94eb2c07201067e702052180e5a0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Can queues also be used to separate workloads?

On 7 Oct 2015 20:34, "Steve Loughran" = <stevel@hortonworks.com>= ; wrote:

> On 7 Oct 2015, at 09:26, Dominik Fries <dominik.fries@woodmark.de> wrote:
>
> Hello Folks,
>
> We want to deploy several spark projects and want to use a unique proj= ect
> user for each of them. Only the project user should start the spark > application and have the corresponding packages installed.
>
> Furthermore a personal user, which belongs to a specific project, shou= ld
> start a spark application via the corresponding spark project user as = proxy.
> (Development)
>
> The Application is currently running with ipython / pyspark. (HDP 2.3 = -
> Spark 1.3.1)
>
> Is this possible or what is the best practice for a spark multi tenanc= y
> environment ?
>
>

Deploy on a kerberized YARN cluster and each application instance will be r= unning as a different unix user in the cluster, with the appropriate access= to HDFS =E2=80=94isolated.

The issue then becomes "do workloads clash with each other?". If = you want to isolate dev & production, using node labels to keep dev wor= k off the production nodes is the standard technique.
--94eb2c07201067e702052180e5a0--