hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: Programming Question / Joining Dataset
Date Wed, 26 Sep 2012 13:36:28 GMT
Have you seen Hive[1] ? It can join DataSets over mapreduce . Also you can
provide your custom SerDes, to read your file format (to avoid
pre-processing) and also create your own data types, (For eg: Map of
Maps,Arrays etc)

[1] https://cwiki.apache.org/Hive/home.html

On Wed, Sep 26, 2012 at 6:49 PM, Oliver B. Fischer <mailsink@swe-blog.net>wrote:

> Hi all,
> I have to join to large datasets A and B. I preprocess both datasets by
> parsing the source text files and creating custom datatypes ADT and BDT out
> ouf it.
> Now I have to join theses data. Both databsets A' and B' already have the
> same datatype as key. But how can I pass both custom datatypes ADT and BDT
> to the same reducer instance for joining?
> Bye,
> Oliver

Bharath .V

View raw message