hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sagar Mehta <sagarme...@gmail.com>
Subject Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Date Fri, 26 Apr 2013 17:35:41 GMT
Hi Nitin,

Thanks for your reply.

Yes this is exactly what we are doing by asking the user to modify the
,hiverc and then using ACLs [white-lists] by configuring
mapred-queue-acls.xml to ensure people don't submit to wrong queues. [or
are not allowed to]

As I said in one of the other threads, besides being a manual approach, it
also exposes the policy where user A is asked to modify his/her .hiverc to
submit jobs to queue X and user B is asked to modify his/her .hiverc to
submit jobs to queue Y potentially with different scheduling properties. We
want this to be more or less transparent to the user.

We have a decent sized cluster [200 nodes] with more than 30+ different

I think the JIRA that Sandy pointed out below is a good first step in that


On Thu, Apr 25, 2013 at 3:04 AM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> the current capacity scheduler guarantees that which users can submit jobs
> to which queue and other related features.
> More of which you can read at
> http://hadoop.apache.org/docs/stable/capacity_scheduler.html
> but on the hive side, unless you set mapred.job.queue.name on the hive
> cli, they will be submitted to default job queue.
> So basically what you would like to do is create user, associate it with a
> queue on scheduler and ask the user to modify its queue on local hiverc
> file.
> I am not sure if this can be part of hive's metastore. Because one user
> can be allowed to submit the job to multiple queues and then best way to
> handle it is via setting the property each time you open the session or via
> hiverc file
> On Thu, Apr 25, 2013 at 12:11 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>> Hi Sagar,
>> This capability currently does not exist in the fair scheduler (or other
>> schedulers, as far as I know), but a JIRA has been filed recently that
>> addresses a similar need.   Would
>> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what
>> you're trying to do?  If not, would you mind filing a new JIRA for the
>> functionality you'd want?
>> -Sandy
>> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sagarmehta@gmail.com>wrote:
>>> Hi Guys,
>>> We have a general purpose Hive cluster [about 200 nodes] which is used
>>> for various jobs like
>>>    - Production
>>>    - Experimental/Research
>>>    - Adhoc queries
>>> We are using the fair-share scheduler to schedule them and for this we
>>> have corresponding 3 pools in the scheduler.
>>> *Here is what we want.*
>>> *A hive query submitted by a user with user-name A should go to one of
>>> the pools above based on a pre-defined mapping. We are wondering where/how
>>> to specify this mapping?*
>>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>>> particular job run.*
>>> This puts the job on the map-reduce queue named "X" and the following
>>> configuration in the fair-share scheduler
>>>   <property>
>>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>>     <value>mapred.job.queue.name</value>
>>>   </property>
>>> maps this to a pool named "X" in the fair-share scheduler.
>>> However we [while wearing our Hadoop developer/admin hat] don't want the
>>> user/analyst to specify that so as to enforce some cluster-use policy.
>>> Based on his/her username we want to automatically select which hadoop
>>> queue and subsequently which fair-share scheduler pool, his/her job should
>>> go to. I'm pretty sure this is a common use-case and wondering how to do
>>> this in Hadoop.
>>> Any help/insights/pointers would be greatly appreciated.
>>> Sagar
>>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
> --
> Nitin Pawar

View raw message