hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rosanna Man <rosa...@auditude.com>
Subject Re: Using capacity scheduler
Date Fri, 29 Apr 2011 19:44:49 GMT
Hi Sreekanth,

Thank you very much for your clarification. Setting the max task limits on
queues will work but can we do something on the max user limit? Is it
pre-emptible also? We are exploring about the possibility of running the
queries with different users for capacity scheduler to maximize the use of
the resources.

Basically, our goal is to maximize the resources (mappers and reducers)
while providing a fair share to the short tasks while a big task is running.
How do you normally achieve hat?

Thanks,
Rosanna

On 4/28/11 8:09 PM, "Sreekanth Ramakrishnan" <sreerama@yahoo-inc.com> wrote:

> Hi
> 
> Currently CapacityScheduler does not have pre-emption. So basically when the
> Job1 starts finishing and freeing up the Job2¹s tasks will start getting
> scheduled. One way you can prevent that queue capacities are not elastic in
> nature is by setting max task limits on queues. That way your job1 will never
> execeed first queues capacity
>     
> 
> 
> 
> On 4/28/11 11:48 PM, "Rosanna Man" <rosanna@auditude.com> wrote:
> 
>> Hi all,
>> 
>> We are using capacity scheduler to schedule resources among different queues
>> for 1 user (hadoop) only. We have set the queues to have equal share of the
>> resources. However, when 1st task starts in the first queue and is consuming
>> all the resources, the 2nd task starts in the 2nd queue will be starved from
>> reducer until the first task finished. A lot of processing is being stuck
>> when a large query is executing.
>> 
>> We are using 0.20.2 hive in amazon aws. We tried to use Fair Scheduler before
>> but it gives an error when the mapper gives no output (which is fine in our
>> use cases).
>> 
>> Anyone can give us some advice?
>> 
>> Thanks,
>> Rosanna


Mime
View raw message