Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C37773318 for ; Mon, 2 May 2011 03:43:14 +0000 (UTC) Received: (qmail 79287 invoked by uid 500); 2 May 2011 03:43:14 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 78721 invoked by uid 500); 2 May 2011 03:43:10 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 78713 invoked by uid 99); 2 May 2011 03:43:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 03:43:09 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 May 2011 03:43:03 +0000 Received: from EGL-EX07CAS01.ds.corp.yahoo.com (egl-ex07cas01.eglbp.corp.yahoo.com [203.83.248.208]) by mrout3.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id p423g8dC028517; Sun, 1 May 2011 20:42:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1304307729; bh=xkrbMjvkqbfriKy3PKE0QHBQzvH84OD7mL3q4DNQ4pU=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=b+PNTHE/2Z85laB9hNSrxIf0PRtUJx8F62Sb1z7iNILq7hH4SJrHALzw5NHt9OSGl 2qUcHMO9V6c4A8/Q3K/8CBjo3JYLqVbv7ADtzvR4wzvJyxQUl/GLyV3DBWl5Mcg8ph eU4wjI+dGupSAwDQ5cNTg8unRc779zNoplvgitsY= Received: from EGL-EX07VS02.ds.corp.yahoo.com ([203.83.248.206]) by EGL-EX07CAS01.ds.corp.yahoo.com ([203.83.248.215]) with mapi; Mon, 2 May 2011 09:12:08 +0530 From: Sreekanth Ramakrishnan To: Rosanna Man , "user@hive.apache.org" Date: Mon, 2 May 2011 09:12:04 +0530 Subject: Re: Using capacity scheduler Thread-Topic: Using capacity scheduler Thread-Index: AcwF0J/YowC5ZpWceEuYTyT04XGKjQASjGfxACLFNRoAdUAoGA== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C9E425E4ADCDsreeramayahooinccom_" MIME-Version: 1.0 --_000_C9E425E4ADCDsreeramayahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The design goal of CapacityScheduler is maximizing the utilization of clust= er resources but it does not fairly allocate the share amongst the total nu= mber of users present in the system. The user limit states the number of concurrent users who can use the slots = in the queue. But then these limits are elastic in nature, as there is no p= reemption as the slots get freed up the new tasks will be allotted those sl= ot to meet the user limit. In order for your requirement, you can possibly submit the large tasks to a= queue which have max task limit set, so your long running jobs don't take = up whole of the cluster capacity and submit shorter, smaller jobs to fast m= oving queue with something like 10% user limit which allows 10 concurrent u= ser per queue. The actual distribution of the of the capacity across longer/shorter jobs d= epends on your workload. On 4/30/11 1:14 AM, "Rosanna Man" wrote: Hi Sreekanth, Thank you very much for your clarification. Setting the max task limits on = queues will work but can we do something on the max user limit? Is it pre-e= mptible also? We are exploring about the possibility of running the queries= with different users for capacity scheduler to maximize the use of the res= ources. Basically, our goal is to maximize the resources (mappers and reducers) whi= le providing a fair share to the short tasks while a big task is running. H= ow do you normally achieve hat? Thanks, Rosanna On 4/28/11 8:09 PM, "Sreekanth Ramakrishnan" wrote= : Hi Currently CapacityScheduler does not have pre-emption. So basically when th= e Job1 starts finishing and freeing up the Job2's tasks will start getting = scheduled. One way you can prevent that queue capacities are not elastic in= nature is by setting max task limits on queues. That way your job1 will ne= ver execeed first queues capacity On 4/28/11 11:48 PM, "Rosanna Man" wrote: Hi all, We are using capacity scheduler to schedule resources among different queue= s for 1 user (hadoop) only. We have set the queues to have equal share of t= he resources. However, when 1st task starts in the first queue and is consu= ming all the resources, the 2nd task starts in the 2nd queue will be starve= d from reducer until the first task finished. A lot of processing is being = stuck when a large query is executing. We are using 0.20.2 hive in amazon aws. We tried to use Fair Scheduler befo= re but it gives an error when the mapper gives no output (which is fine in = our use cases). Anyone can give us some advice? Thanks, Rosanna -- Sreekanth Ramakrishnan --_000_C9E425E4ADCDsreeramayahooinccom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Using capacity scheduler
The design goal of CapacityScheduler is maximizing the utilization of clust= er resources but it does not fairly allocate the share amongst the total nu= mber of users present in the system.

The user limit states the number of concurrent users who can use the slots = in the queue. But then these limits are elastic in nature, as there is no p= reemption as the slots get freed up the new tasks will be allotted those sl= ot to meet the user limit.

In order for your requirement, you can possibly submit the large tasks to a= queue which have max task limit set, so your long running jobs don’t= take up whole of the cluster capacity and submit shorter, smaller jobs to = fast moving queue with something like 10% user limit which allows 10 concur= rent user per queue.

The actual distribution of the of the capacity across longer/shorter jobs d= epends on your workload.
 

On 4/30/11 1:14 AM, "Rosanna Man" <rosanna@auditude.com> wrote:

Hi Sreekanth,

Thank you very much for your clarification. Setting the max task limits on = queues will work but can we do something on the max user limit? Is it pre-e= mptible also? We are exploring about the possibility of running the queries= with different users for capacity scheduler to maximize the use of the res= ources.

Basically, our goal is to maximize the resources (mappers and reducers) whi= le providing a fair share to the short tasks while a big task is running. H= ow do you normally achieve hat?

Thanks,
Rosanna

On 4/28/11 8:09 PM, "Sreekanth Ramakrishnan" <sreerama@yahoo-inc.com> wrote:

Hi

Currently CapacityScheduler does not have pre-emption. So basically when th= e Job1 starts finishing and freeing up the Job2’s tasks will start ge= tting scheduled. One way you can prevent that queue capacities are not elas= tic in nature is by setting max task limits on queues. That way your job1 w= ill never execeed first queues capacity
    



On 4/28/11 11:48 PM, "Rosanna Man" <rosanna@auditude.com> wrote:

Hi all,

We are using capacity scheduler to schedule resources among different queue= s for 1 user (hadoop) only. We have set the queues to have equal share of t= he resources. However, when 1st task starts in the first queue and is consu= ming all the resources, the 2nd task starts in the 2nd queue will be starve= d from reducer until the first task finished. A lot of processing is being = stuck when a large query is executing.

We are using 0.20.2 hive in amazon aws. We tried to use Fair Scheduler befo= re but it gives an error when the mapper gives no output (which is fine in = our use cases).

Anyone can give us some advice?

Thanks,
Rosanna