Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 74241 invoked from network); 9 Jan 2008 23:22:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Jan 2008 23:22:42 -0000 Received: (qmail 55474 invoked by uid 500); 9 Jan 2008 23:22:29 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 55452 invoked by uid 500); 9 Jan 2008 23:22:29 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 55443 invoked by uid 99); 9 Jan 2008 23:22:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2008 15:22:29 -0800 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jssarma@facebook.com designates 204.15.23.140 as permitted sender) Received: from [204.15.23.140] (HELO sf2pmxf02.TheFacebook.com) (204.15.23.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2008 23:22:06 +0000 Received: from SF2PMXB01.TheFacebook.com ([192.168.16.15]) by sf2pmxf02.TheFacebook.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 9 Jan 2008 15:22:55 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C85316.901AEEB2" Subject: RE: Question on running simultaneous jobs Date: Wed, 9 Jan 2008 15:22:55 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Question on running simultaneous jobs Thread-Index: AchSUfE3y9YbHFKnTIWWTKXWmVYizgAmezAQAAImwFAABUnYkgAAU5LAAAKl8is= References: <12C2BCDB3FA74D4E8E48232599861119021FE3A1@fegplmsexmb05.ffe.foxeg.com> <12C2BCDB3FA74D4E8E48232599861119021FE3A6@fegplmsexmb05.ffe.foxeg.com> From: "Joydeep Sen Sarma" To: , X-OriginalArrivalTime: 09 Jan 2008 23:22:55.0483 (UTC) FILETIME=[904424B0:01C85316] X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C85316.901AEEB2 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > that can run(per job) at any given time. =20 =20 not possible afaik - but i will be happy to hear otherwise. =20 priorities are a good substitute though. there's no point needlessly = restricting concurrency if there's nothing else to run. if there is = something else more important to run - then in most cases, assigning a = higher priority to that other thing would make the right thing happen. =20 except with long running tasks (usually reducers) that cannot be = preempted. (Hadoop does not seem to use OS process priorities at all. I = wonder if process priorities can be used as a substitute for = pre-emption.) =20 HOD is another solution that you might want to look into - my = understanding is that with HOD u can restrict the number of machines = used by a job. =20 ________________________________ From: Xavier Stevens [mailto:Xavier.Stevens@fox.com] Sent: Wed 1/9/2008 2:57 PM To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets the total number of map/reduce tasks. When setting the total number of map tasks I get an ArrayOutOfBoundsException within Hadoop; I believe because of the input dataset size (around 90 million lines). I think it is important to make a distinction between setting total number of map/reduce tasks and the number that can run(per job) at any given time. I would like only to restrict the later, while allowing Hadoop to divide the data into chunks as it sees fit. -----Original Message----- From: Ted Dunning [mailto:tdunning@veoh.com] Sent: Wednesday, January 09, 2008 1:50 PM To: hadoop-user@lucene.apache.org Subject: Re: Question on running simultaneous jobs You may need to upgrade, but 15.1 does just fine with multiple jobs in the cluster. Use conf.setNumMapTasks(int) and conf.setNumReduceTasks(int). On 1/9/08 11:25 AM, "Xavier Stevens" wrote: > Does Hadoop support running simultaneous jobs? If so, what parameters > do I need to set in my job configuration? We basically want to give a > job that takes a really long time, half of the total resources of the > cluster so other jobs don't queue up behind it. > > I am using Hadoop 0.14.2 currently. I tried setting > mapred.tasktracker.tasks.maximum to be half of the maximum specified > in mapred-default.xml. This shows the change in the web > administration page for the job, but it has no effect on the actual > numbers of tasks running. > > Thanks, > > Xavier > ------_=_NextPart_001_01C85316.901AEEB2--