hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sagar Mehta <sagarme...@gmail.com>
Subject Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue
Date Fri, 26 Apr 2013 17:27:52 GMT
Hi Sandy,

Thanks for your prompt reply!!

The jira that you pointed out would make it easy for us to do the automatic
mapping and getting close towards enforcing a policy automatically. Any
idea when it would be incorporated into cdh/hadoop releases and if it could
be back-ported for cdh3u2 which we have currently running in production?

Currently we are getting around this using the -Dmapred.job.queue.name="X"
and the subsequent mapping of map-red job queue to Fair-share scheduler
pool. We are using ACLs [more of a white-list] by
configuring  mapred-queue-acls.xml to ensure people can only submit to the
right queue.

*Two limitations of this round-about approach are*

   1. It is manual
   2. It exposes the policy where user A is asked to submit jobs to queue X
   and user B is asked to submit jobs to queue Y [with different scheduler
   properties]. We want this to be completely transparent to the user of our
   cluster.

The jira above would be a great first step towards such automatic mapping!!

Cheers,
Sagar

On Wed, Apr 24, 2013 at 11:41 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:

> Hi Sagar,
>
> This capability currently does not exist in the fair scheduler (or other
> schedulers, as far as I know), but a JIRA has been filed recently that
> addresses a similar need.   Would
> https://issues.apache.org/jira/browse/MAPREDUCE-5132 work for what you're
> trying to do?  If not, would you mind filing a new JIRA for the
> functionality you'd want?
>
> -Sandy
>
>
> On Wed, Apr 24, 2013 at 6:22 PM, Sagar Mehta <sagarmehta@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have a general purpose Hive cluster [about 200 nodes] which is used
>> for various jobs like
>>
>>    - Production
>>    - Experimental/Research
>>    - Adhoc queries
>>
>> We are using the fair-share scheduler to schedule them and for this we
>> have corresponding 3 pools in the scheduler.
>>
>> *Here is what we want.*
>>
>> *A hive query submitted by a user with user-name A should go to one of
>> the pools above based on a pre-defined mapping. We are wondering where/how
>> to specify this mapping?*
>>
>> *We can do this manually by adding -Dmapred.job.queue.name="X" on a
>> particular job run.*
>>
>> This puts the job on the map-reduce queue named "X" and the following
>> configuration in the fair-share scheduler
>>
>>   <property>
>>     <name>mapred.fairscheduler.poolnameproperty</name>
>>     <value>mapred.job.queue.name</value>
>>   </property>
>>
>> maps this to a pool named "X" in the fair-share scheduler.
>>
>> However we [while wearing our Hadoop developer/admin hat] don't want the
>> user/analyst to specify that so as to enforce some cluster-use policy.
>>
>> Based on his/her username we want to automatically select which hadoop
>> queue and subsequently which fair-share scheduler pool, his/her job should
>> go to. I'm pretty sure this is a common use-case and wondering how to do
>> this in Hadoop.
>>
>> Any help/insights/pointers would be greatly appreciated.
>>
>> Sagar
>> PS - Btw we are using Cloudera cdh3u2 and the user jobs are Hive queries.
>>
>>
>>
>>
>

Mime
View raw message