hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <>
Subject Re: non-equality joins
Date Tue, 13 Mar 2012 17:28:28 GMT
Sounds like Matt possesses the proper combination of expertise in both databases and MapReduce
to assist you.  I'm bowing out as I honestly don't know advanced database concepts at all.
 In addition, hive offers hive-specific tools like Matt suggested (map-side joins) to help
out, which I'm too new too to speculate on.  I'm just starting hive this week as a matter
of fact.

The short answer on MapReduce algorithms is that the individual computational units can't
communicate with each other (each mapper or each map() in fact cannot communicate with the
others, likewise for reducers).  That's one of the major distinctions between MapReduce and
more general parallel processing frameworks like MPI.  This is the wrong mailing list to go
much deeper than that however.

Thanks Matt.

Best of luck Mahsa.

On Mar 13, 2012, at 10:13 , Tucker, Matt wrote:

> For theta joins, you’ll have to convert the query to an equi-join, and then filter
for non-equality in the WHERE clause.  Depending upon the size of each table, you might consider
looking at map-side joins, which will allow for doing non-equality filters during a join before
it’s passed to the reducers.
> Matt Tucker
> From: mahsa mofidpoor [] 
> Sent: Tuesday, March 13, 2012 1:02 PM
> To:
> Subject: Re: non-equality joins
> Hi Keith,
> Do you know exactly how an algorithm should be in order to fit in the MapReduce framework?
Could you refer me to some references?
> Thanks and Regards,
> Mahsa

Keith Wiley

"Luminous beings are we, not this crude matter."
                                           --  Yoda

View raw message