hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Hadoop scheduling question
Date Fri, 05 Jun 2009 18:44:05 GMT
To add a little context, Pig uses Hadoop's JobControl to schedule it's  
jobs.  Pig defines the dependencies between jobs in JobControl, and  
then submits the entire graph of jobs.  So, using JobControl, does  
Hadoop schedule jobs serially or in parallel (assuming no dependencies)?


On Jun 5, 2009, at 10:50 AM, Kristi Morton wrote:

> Hi Pankil,
> Sorry about having to send my question email twice to the list...  
> the first time I sent it I had forgotten to subscribe to the list.   
> I resent it after subscribing, and your response to the first email  
> I sent did not make it into my inbox.  I saw your response on the  
> archives list.
> So, to recap, you said:
> "We are not able to carry out all joins in a single job..we also  
> tried our hadoop code using
> Pig scripts and found that for each join in PIG script new job is  
> used.So
> basically what i think its a sequential process to handle typesof  
> join where
> output of one job is required s an input to other one."
> I, too, have seen this sequential behavior with joins.  However, it  
> seems like it could be possible for there to be two jobs executing  
> in parallel whose output is the input to the subsequent job.  Is  
> this possible or are all jobs scheduled sequentially?
> Thanks,
> Kristi

View raw message