hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: map side Vs. Reduce side join
Date Fri, 17 Jul 2009 06:01:16 GMT
I seem to be one of the mapside join champions. For jobs that fit onto that
pattern there is usually a 100x speed improvment, compared to doing reduce
side joins, for real (large) datasets.

On Wed, Jul 15, 2009 at 12:05 PM, bonito perdo
<bonito.perdo@googlemail.com>wrote:

> Thank you for your responses. Really helpful.
> In case we want to compare the two join evaluation methods by means of
> cost,
> how can one approach it? By means of I/O cost or ?
> Can 'ganglia' metrics be used for this reason?
> Thank you.
>
> On Tue, Jul 14, 2009 at 11:12 PM, Owen O'Malley <owen.omalley@gmail.com
> >wrote:
>
> > Map-side join is almost always more efficient, but only handles some
> cases.
> > Reduce side joins always work, but require a complete map/reduce job to
> get
> > the join.
> >
> > -- Owen
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message