hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagat Singh <jagatsi...@gmail.com>
Subject Re: Cartesian product in hadoop
Date Thu, 18 Apr 2013 09:58:05 GMT
Hi,

Can you have a look at

http://pig.apache.org/docs/r0.11.1/basic.html#cross

Thanks


On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong <zheyi.rong@gmail.com> wrote:

> Dear all,
>
> I am writing to kindly ask for ideas of doing cartesian product in hadoop.
> Specifically, now I have two datasets, each of which contains 20million
> lines.
> I want to do cartesian product on these two datasets, comparing lines
> pairwisely.
>
> The output of each comparison can be mostly filtered by a function ( we do
> not output the
> whole result of this cartesian product, but only a small part).
>
> I guess one good way is to pass one block from dataset1 and another block
> from dataset2
> to a mapper, then let the mappers do the product in memory to avoid IO.
>
> Any suggestions?
> Thank you very much.
>
> Regards,
> Zheyi Rong
>

Mime
View raw message