Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of sandy.ryza@cloudera.com
 designates 74.125.83.46 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMq4vAG1U9pYqzn2384cx8Ke=hPnJN3BwQ72tUM74B43B0Jg9A@mail.gmail.com>
References: 
 <CAMq4vAG1U9pYqzn2384cx8Ke=hPnJN3BwQ72tUM74B43B0Jg9A@mail.gmail.com>
Date: Wed, 24 Apr 2013 23:41:44 -0700
Message-ID: 
 <CACBYxKJk1adhfOzsauyrCY1pjyNFNO0rKcz7g4KjFHf8K=GuUg@mail.gmail.com>
Subject: Re: Automatically mapping a job submitted by a particular user to a
 specific hadoop map-reduce queue
From: Sandy Ryza <sandy.ryza@cloudera.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e016813103f026f04db29b6b8

--089e016813103f026f04db29b6b8
Content-Type: text/plain; charset=ISO-8859-1

Hi Sagar,

This capability currently does not exist in the fair scheduler (or other
schedulers, as far as I know), but a JIRA has been filed recently that
addresses a similar need.   Would
https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
trying to do?  If not, would you mind filing a new JIRA for the
functionality you'd want?

-Sandy


On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sagarmehta@gmail.com> wrote:

> Hi Guys,
>
> We have a general purpose Hive cluster [about 200 nodes] which is used for
> various jobs like
>
>    - Production
>    - Experimental/Research
>    - Adhoc queries
>
> We are using the fair-share scheduler to schedule them and for this we
> have corresponding 3 pools in the scheduler.
>
> *Here is what we want.*
>
> *A hive query submitted by a user with user-name A should go to one of
> the pools above based on a pre-defined mapping. We are wondering where/how
> to specify this mapping?*
>
> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
> particular job run.*
>
> This puts the job on the map-reduce queue named "X" and the following
> configuration in the fair-share scheduler
>
>   <property>
>     <name>mapred.fairscheduler.poolnameproperty</name>
>     <value>mapred.job.queue.name</value>
>   </property>
>
> maps this to a pool named "X" in the fair-share scheduler.
>
> However we [while wearing our Hadoop developer/admin hat] don't want the
> user/analyst to specify that so as to enforce some cluster-use policy.
>
> Based on his/her username we want to automatically select which hadoop
> queue and subsequently which fair-share scheduler pool, his/her job should
> go to. I'm pretty sure this is a common use-case and wondering how to do
> this in Hadoop.
>
> Any help/insights/pointers would be greatly appreciated.
>
> Sagar
> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>
>
>
>

--089e016813103f026f04db29b6b8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Sagar,<div><br></div><div style>This capability current=
ly does not exist in the fair scheduler (or other schedulers, as far as I k=
now), but a JIRA has been filed recently that addresses a similar need. =A0=
 Would=A0<a href=3D"https://issues.apache.org/jira/browse/MAPREDUCE-5132">h=
ttps://issues.apache.org/jira/browse/MAPREDUCE-5132</a>=A0work for what you=
&#39;re trying to do? =A0If not, would you mind filing a new JIRA for the f=
unctionality you&#39;d want?</div>
<div style><br></div><div style>-Sandy</div></div><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Wed, Apr 24, 2013 at 6:22 PM, Sagar=
 Mehta <span dir=3D"ltr">&lt;<a href=3D"mailto:sagarmehta@gmail.com" target=
=3D"_blank">sagarmehta@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi Guys,<div><br></div><div>We have a genera=
l purpose Hive cluster [about 200 nodes] which is used for various jobs lik=
e</div>
<div><ul><li>Production</li><li>Experimental/Research</li><li>Adhoc queries=
</li></ul><div>We are using the fair-share scheduler to schedule them and f=
or this we have corresponding 3 pools in the scheduler.</div>
</div><div><br></div><div><b>Here is what we want.</b></div><div><br></div>=
<div><b>A hive query submitted by a user with user-name A should go to one =
of the pools above based on a pre-defined mapping. We are wondering where/h=
ow to specify this mapping?</b></div>

<div><br></div><div><b>We can do this manually by adding -<a href=3D"http:/=
/Dmapred.job.queue.name" target=3D"_blank">Dmapred.job.queue.name</a>=3D&qu=
ot;X&quot; on a particular job run.</b></div><div><br></div><div>This puts =
the job on the map-reduce queue named &quot;X&quot; and the following confi=
guration in the fair-share scheduler</div>

<div><br></div><div><div>=A0 &lt;property&gt;</div><div>=A0 =A0 &lt;name&gt=
;mapred.fairscheduler.poolnameproperty&lt;/name&gt;</div><div>=A0 =A0 &lt;v=
alue&gt;<a href=3D"http://mapred.job.queue.name" target=3D"_blank">mapred.j=
ob.queue.name</a>&lt;/value&gt;</div>

<div>=A0 &lt;/property&gt;</div></div><div><br></div><div>maps this to a po=
ol named &quot;X&quot; in the fair-share scheduler.</div><div><br></div><di=
v>However we [while wearing our Hadoop developer/admin hat] don&#39;t want =
the user/analyst to specify that so as to enforce some cluster-use policy.<=
/div>

<div><br></div><div>Based on his/her username we want to automatically sele=
ct which hadoop queue and subsequently which fair-share scheduler pool, his=
/her job should go to. I&#39;m pretty sure this is a common use-case and wo=
ndering how to do this in Hadoop.=A0</div>

<div><br></div><div>Any help/insights/pointers would be greatly appreciated=
.</div><span class=3D"HOEnZb"><font color=3D"#888888"><div><br></div><div>S=
agar</div></font></span><div>PS - Btw we are using Cloudera cdh3u2 and the =
user jobs are Hive queries.</div>
<div><br></div><div><br>
</div><div><br></div>
</blockquote></div><br></div>

--089e016813103f026f04db29b6b8--