hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tucker, Matt" <Matt.Tuc...@disney.com>
Subject RE: non-equality joins
Date Tue, 13 Mar 2012 17:13:01 GMT
For theta joins, you'll have to convert the query to an equi-join, and then filter for non-equality
in the WHERE clause.  Depending upon the size of each table, you might consider looking at
map-side joins, which will allow for doing non-equality filters during a join before it's
passed to the reducers.

Matt Tucker

From: mahsa mofidpoor [mailto:mofidpoor@gmail.com]
Sent: Tuesday, March 13, 2012 1:02 PM
To: user@hive.apache.org
Subject: Re: non-equality joins


Hi Keith,

Do you know exactly how an algorithm should be in order to fit in the MapReduce framework?
Could you refer me to some references?

Thanks and Regards,
Mahsa



On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley <kwiley@keithwiley.com<mailto:kwiley@keithwiley.com>>
wrote:
https://cwiki.apache.org/Hive/languagemanual-joins.html

"Hive does not support join conditions that are not equality conditions as it is very difficult
to express such conditions as a map/reduce job."

I admit, that isn't a very detailed answer, but it gives some indication of the reason for
the discrepancy between Hive and other databases.  Hive fundamentally operates on Hadoop,
namely on MapReduce (we all know this, I'm just reiterating the train of thought).  The problem
is that certain algorithms are exceedingly difficult to wedge into the MapReduce framework.

That is as detailed as my personal insight can get.  I've done a lot of MapReduce programming
in Hadoop but I'm not a database expert and I don't really understand the steps involved in
various kinds of table-joins, so I don't understand the particular ways in which certain database
operations do or do not fit into MapReduce...but presumably nonequality joins (whatever those
are :-D ) are particularly difficult to MapReduceify.

Cheers!

On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote:

> Hello,
>
> Is there a reason behind not implementing non-equality joins in Hive? In other words,
is there any usage for theta-join, if implemented?
>
> Thank you in advance for your response,
> Mahsa

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com<mailto:kwiley@keithwiley.com>     keithwiley.com<http://keithwiley.com>
   music.keithwiley.com<http://music.keithwiley.com>

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                          --  Keith Wiley
________________________________________________________________________________


Mime
View raw message