hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankil Doshi <forpan...@gmail.com>
Subject Re: Hadoop scheduling question
Date Fri, 05 Jun 2009 04:19:20 GMT
Hello Kristi,

I am Research Assistant at University of Texas at Dallas. We are working of
RDF data and we come across many joins in our queries. But We are not able
to carry out all joins in a single job..we also tried our hadoop code using
Pig scripts and found that for each join in PIG script new job is used.So
basically what i think its a sequential process to handle typesof join where
output of one job is required s an input to other one.

do let me know what you think about my view point.


On Thu, Jun 4, 2009 at 7:12 PM, Kristi Morton <kmorton@cs.washington.edu>wrote:

> Hi,
> I'm a Hadoop 17 user who is doing research with Prof. Magda Balazinska at
> the University of Washington on an improved progress indicator for Pig
> Latin.  We have a question regarding how Hadoop schedules Pig Latin queries
> with JOIN operators.  Does Hadoop schedule all MapReduce jobs in a script
> sequentially or does it ever schedule two MapReduce jobs in parallel.  For
> example, if the output of two Map-Reduce jobs is later joined and each of
> these jobs only needs a subset of the cluster resources, would they be
> scheduled in parallel or in series?
> I apologize if I sent this to the wrong list, but please let me know which
> list is most appropriate for this type of question.
> Thanks,
> Kristi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message