From Ted Dunning <tdunn...@veoh.com>
Date Wed, 25 Jul 2007 16:19:14 GMT
Remember that you can do more than one map/reduce step.

Suppose that you want to implement something that looks like this:

Select f(x), g(y), z from table1 join table2 using (j1, j2) where z > 0

Also assume that table1 and table2 have lots of columns besides x, y and z.

You can implement this with a map-reduce where the map step gets both table1
and table2 as inputs.  The output of the map step will be empty if z <= 0
and will otherwise have (j1, j2) as key and f(x), g(y), z as value.  The
reduce function will get records from table1 and table2 all mixed together,
but grouped according to the join key.  It can combine these into the
desired output.

If you add a "group by y, z" clause, then f has to be a function of a set of
values of x (like max or average, but you get to write it).  You should
change the map function so that the key is now (j1, j2, y, z) and the value
would be x, g(y), z.  Then change the reduce function to collect the values
of x and compute f(x) (and pass through g(y) and z).

Hope this helps.

The key here is that the output can be polymorphic so you can use the sort
phase between map and reduce to do the join.

On 7/25/07 4:05 AM, "meda vijendharreddy" <medavijju@yahoo.co.in> wrote:

> Hi,
> Currently I want to simulate something like
> "SELECT FROM WHERE "
> FROM , if I have more than 2 tables, then i can do
> join on those and if it is single table then i can
> acheive  easily.
>
> How can I acheive the where condition functionality.
