hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Cartesian product in hadoop
Date Thu, 18 Apr 2013 10:21:56 GMT
This is not suitable for his large dataset.

--Send from my Sony mobile.
On Apr 18, 2013 5:58 PM, "Jagat Singh" <jagatsingh@gmail.com> wrote:

> Hi,
>
> Can you have a look at
>
> http://pig.apache.org/docs/r0.11.1/basic.html#cross
>
> Thanks
>
>
> On Thu, Apr 18, 2013 at 7:47 PM, zheyi rong <zheyi.rong@gmail.com> wrote:
>
>> Dear all,
>>
>> I am writing to kindly ask for ideas of doing cartesian product in hadoop.
>> Specifically, now I have two datasets, each of which contains 20million
>> lines.
>> I want to do cartesian product on these two datasets, comparing lines
>> pairwisely.
>>
>> The output of each comparison can be mostly filtered by a function ( we
>> do not output the
>> whole result of this cartesian product, but only a small part).
>>
>> I guess one good way is to pass one block from dataset1 and another block
>> from dataset2
>> to a mapper, then let the mappers do the product in memory to avoid IO.
>>
>> Any suggestions?
>> Thank you very much.
>>
>> Regards,
>> Zheyi Rong
>>
>
>

Mime
View raw message