hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: Programming Question / Joining Dataset
Date Wed, 26 Sep 2012 13:42:06 GMT
The design pattern for this is called "Reduce-side Join". Enter it into Google and you will
get a lot of details.

Kai

Am 26.09.2012 um 15:39 schrieb "Oliver B. Fischer" <mailsink@swe-blog.net>:

> Yes I know Hive and also Pig. Both are suitable for my problems but before starting with
one of them I simply would like to know how to do it with pure MR. ;-)
> 
> Bye,
> 
> Oliver
> 
> On 09/26/2012 03:36 PM, bharath vissapragada wrote:
>> Have you seen Hive[1] ? It can join DataSets over mapreduce . Also you
>> can provide your custom SerDes, to read your file format (to avoid
>> pre-processing) and also create your own data types, (For eg: Map of
>> Maps,Arrays etc)
>> 
>> [1] https://cwiki.apache.org/Hive/home.html
>> 
>> On Wed, Sep 26, 2012 at 6:49 PM, Oliver B. Fischer
>> <mailsink@swe-blog.net <mailto:mailsink@swe-blog.net>> wrote:
>> 
>>    Hi all,
>> 
>>    I have to join to large datasets A and B. I preprocess both datasets
>>    by parsing the source text files and creating custom datatypes ADT
>>    and BDT out ouf it.
>> 
>>    Now I have to join theses data. Both databsets A' and B' already
>>    have the same datatype as key. But how can I pass both custom
>>    datatypes ADT and BDT to the same reducer instance for joining?
>> 
>>    Bye,
>> 
>>    Oliver
>> 
>> 
>> 
>> 
>> --
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
>> <http://researchweb.iiit.ac.in/%7Ebharath.v>
> 

-- 
Kai Voigt
k@123.org





Mime
View raw message