hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rosanna Man <rosa...@auditude.com>
Subject Re: Using capacity scheduler
Date Mon, 02 May 2011 21:22:55 GMT
Hi Sreekanth,

When you mention about setting the max task limit, do you mean by executing

set mapred.capacity-scheduler.queue.<queue-name>.maximum-capacity = <a
percentage> ?

Is it only available on hadoop 0.21?

Thanks,
Rosanna

On 5/1/11 8:42 PM, "Sreekanth Ramakrishnan" <sreerama@yahoo-inc.com> wrote:

> 
> The design goal of CapacityScheduler is maximizing the utilization of cluster
> resources but it does not fairly allocate the share amongst the total number
> of users present in the system.
> 
> The user limit states the number of concurrent users who can use the slots in
> the queue. But then these limits are elastic in nature, as there is no
> preemption as the slots get freed up the new tasks will be allotted those slot
> to meet the user limit.
> 
> In order for your requirement, you can possibly submit the large tasks to a
> queue which have max task limit set, so your long running jobs don¹t take up
> whole of the cluster capacity and submit shorter, smaller jobs to fast moving
> queue with something like 10% user limit which allows 10 concurrent user per
> queue.
> 
> The actual distribution of the of the capacity across longer/shorter jobs
> depends on your workload.
>  
> 
> On 4/30/11 1:14 AM, "Rosanna Man" <rosanna@auditude.com> wrote:
> 
>> Hi Sreekanth,
>> 
>> Thank you very much for your clarification. Setting the max task limits on
>> queues will work but can we do something on the max user limit? Is it
>> pre-emptible also? We are exploring about the possibility of running the
>> queries with different users for capacity scheduler to maximize the use of
>> the resources.
>> 
>> Basically, our goal is to maximize the resources (mappers and reducers) while
>> providing a fair share to the short tasks while a big task is running. How do
>> you normally achieve hat?
>> 
>> Thanks,
>> Rosanna
>> 
>> On 4/28/11 8:09 PM, "Sreekanth Ramakrishnan" <sreerama@yahoo-inc.com> wrote:
>> 
>>> Hi
>>> 
>>> Currently CapacityScheduler does not have pre-emption. So basically when the
>>> Job1 starts finishing and freeing up the Job2¹s tasks will start getting
>>> scheduled. One way you can prevent that queue capacities are not elastic in
>>> nature is by setting max task limits on queues. That way your job1 will
>>> never execeed first queues capacity
>>>     
>>> 
>>> 
>>> 
>>> On 4/28/11 11:48 PM, "Rosanna Man" <rosanna@auditude.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> We are using capacity scheduler to schedule resources among different
>>>> queues for 1 user (hadoop) only. We have set the queues to have equal share
>>>> of the resources. However, when 1st task starts in the first queue and is
>>>> consuming all the resources, the 2nd task starts in the 2nd queue will be
>>>> starved from reducer until the first task finished. A lot of processing is
>>>> being stuck when a large query is executing.
>>>> 
>>>> We are using 0.20.2 hive in amazon aws. We tried to use Fair Scheduler
>>>> before but it gives an error when the mapper gives no output (which is fine
>>>> in our use cases).
>>>> 
>>>> Anyone can give us some advice?
>>>> 
>>>> Thanks,
>>>> Rosanna
>> 


Mime
View raw message