Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
User-Agent: Microsoft-Entourage/11.3.3.061214
Date: Thu, 22 May 2008 19:16:00 -0700
Subject: Re: Can you run multiple simultaneous hadoop jobs?
From: Ted Dunning <tdunning@veoh.com>
To: <core-user@hadoop.apache.org>
Message-ID: <C45B76F0.2008%tdunning@veoh.com>
Thread-Topic: Can you run multiple simultaneous hadoop jobs?
Thread-Index: Aci8evFGL/o+1ihuEd2/wwAWy8rVfQ==
In-Reply-To: <731772.78068.qm@web38608.mail.mud.yahoo.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit


I think that there is an conf parameter that sets the maximum of map
invocations that will be used by your program.

Failing that, you can always set the number of splits to a small number, but
that is less likely to balance computation well.  Better to have a
significantly larger number of splits than map nodes.


On 5/22/08 6:58 PM, "Kayla Jay" <kaylais30@yahoo.com> wrote:

> By that, do you mean setting the # of mappers?
> 
> 
> ----- Original Message ----
> From: Ted Dunning <tdunning@veoh.com>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 22, 2008 5:19:32 PM
> Subject: Re: Can you run multiple simultaneous hadoop jobs?
> 
> 
> You definitely can run more than one job on a hadoop cluster.  But if one of
> the jobs asks to use all of the map or reduce nodes, then the other job will
> have to wait for some of the nodes to free up before proceeding.
> 
> Try limiting the number of map nodes and see how that changes matters.
> 
> 
> On 5/22/08 1:46 PM, "Kayla Jay" <kaylais30@yahoo.com> wrote:
> 
>> 
>> Hello.
>> 
>> I'm trying to figure out why I need to use HOD vs. trying to run multiple
>> jobs
>> at the same time on the same set of resources.  Is it possible to run
>> multiple
>> hadoop jobs at the same time on the same set of input data?  I tried to run
>> different jobs on the same set of data at the same time, but it takes a while
>> (way while) and almost appears as if it queues up and the next job has to
>> wait
>> and so forth before completing.
>> 
>> So, I tried moving onto HOD.  It's not very apparent why one would want to
>> use
>> HOD to run on different nodes at the same time for different jobs that access
>> the same set of input data.
>> 
>> Can anyone provide any feedback on running multiple jobs at the same time on
>> the same set of data?  HOD use?  Why would I have to run HOD and schedule
>> running multiple jobs at the same time on the same set of data, but within
>> their own set of resources in the cluster?
>> 
>> Thanks
>> 
>> 
>>      
> 
> 
>