Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <39659BA7-FD40-4449-B54C-EEA7F36ADEA5@hortonworks.com>
References: 
 <DB4PR05MB3189B45CF66F35264037D46F5360@DB4PR05MB318.eurprd05.prod.outlook.com>
	<39659BA7-FD40-4449-B54C-EEA7F36ADEA5@hortonworks.com>
Date: Wed, 7 Oct 2015 21:06:11 +1100
Message-ID: 
 <CAA+15pc7s0XrPWCwhMNSMX93nU1aNfn6sEWJu6VrxstJ8O7gJA@mail.gmail.com>
Subject: Re: spark multi tenancy
From: ayan guha <guha.ayan@gmail.com>
To: Steve Loughran <stevel@hortonworks.com>
Cc: user <user@spark.apache.org>, Dominik Fries <dominik.fries@woodmark.de>
Content-Type: multipart/alternative; boundary=94eb2c07201067e702052180e5a0

--94eb2c07201067e702052180e5a0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Can queues also be used to separate workloads?
On 7 Oct 2015 20:34, "Steve Loughran" <stevel@hortonworks.com> wrote:

>
> > On 7 Oct 2015, at 09:26, Dominik Fries <dominik.fries@woodmark.de>
> wrote:
> >
> > Hello Folks,
> >
> > We want to deploy several spark projects and want to use a unique proje=
ct
> > user for each of them. Only the project user should start the spark
> > application and have the corresponding packages installed.
> >
> > Furthermore a personal user, which belongs to a specific project, shoul=
d
> > start a spark application via the corresponding spark project user as
> proxy.
> > (Development)
> >
> > The Application is currently running with ipython / pyspark. (HDP 2.3 -
> > Spark 1.3.1)
> >
> > Is this possible or what is the best practice for a spark multi tenancy
> > environment ?
> >
> >
>
> Deploy on a kerberized YARN cluster and each application instance will be
> running as a different unix user in the cluster, with the appropriate
> access to HDFS =E2=80=94isolated.
>
> The issue then becomes "do workloads clash with each other?". If you want
> to isolate dev & production, using node labels to keep dev work off the
> production nodes is the standard technique.

--94eb2c07201067e702052180e5a0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Can queues also be used to separate workloads?</p>
<div class=3D"gmail_quote">On 7 Oct 2015 20:34, &quot;Steve Loughran&quot; =
&lt;<a href=3D"mailto:stevel@hortonworks.com">stevel@hortonworks.com</a>&gt=
; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D=
"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
&gt; On 7 Oct 2015, at 09:26, Dominik Fries &lt;<a href=3D"mailto:dominik.f=
ries@woodmark.de">dominik.fries@woodmark.de</a>&gt; wrote:<br>
&gt;<br>
&gt; Hello Folks,<br>
&gt;<br>
&gt; We want to deploy several spark projects and want to use a unique proj=
ect<br>
&gt; user for each of them. Only the project user should start the spark<br=
>
&gt; application and have the corresponding packages installed.<br>
&gt;<br>
&gt; Furthermore a personal user, which belongs to a specific project, shou=
ld<br>
&gt; start a spark application via the corresponding spark project user as =
proxy.<br>
&gt; (Development)<br>
&gt;<br>
&gt; The Application is currently running with ipython / pyspark. (HDP 2.3 =
-<br>
&gt; Spark 1.3.1)<br>
&gt;<br>
&gt; Is this possible or what is the best practice for a spark multi tenanc=
y<br>
&gt; environment ?<br>
&gt;<br>
&gt;<br>
<br>
Deploy on a kerberized YARN cluster and each application instance will be r=
unning as a different unix user in the cluster, with the appropriate access=
 to HDFS =E2=80=94isolated.<br>
<br>
The issue then becomes &quot;do workloads clash with each other?&quot;. If =
you want to isolate dev &amp; production, using node labels to keep dev wor=
k off the production nodes is the standard technique. </blockquote></div>

--94eb2c07201067e702052180e5a0--