hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Nayak <snay...@gmail.com>
Subject Re: Reading 2 table data in MapReduce for Performing Join
Date Wed, 18 Mar 2015 17:30:27 GMT
Hi All,

https://issues.apache.org/jira/browse/HIVE-4997 patch helped!

On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak <snayakm@gmail.com> wrote:

> Hi,
>
> I tried reading data via HCatalog for 1 Hive table in MapReduce using
> something similar to
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
> I was able to read successfully.
>
> Now am trying to read 2 tables, as the requirement is to join 2 tables. I
> did not find API similar to *FileInputFormat.addInputPaths* in
> *HCatInputFormat*. What is the equivalent of the same in HCat ?
>
> I had performed join using FilesInputFormat in HDFS(by getting split
> information in mapper). This article(
> http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
> <http://www.codingjunkie.com/mapreduce-reduce-joins/> Can someone suggest
> how I can perform join operation using HCatalog ?
>
> Briefly, the aim is to
>
>    - Read 2 tables (almost similar schema)
>    - If key exists in both the table send it to same reducer.
>    - Do some processing on the records in reducer.
>    - Save the output into file/Hive table.
>
> *P.S : The reason for using MapReduce to perform join is because of
> complex requirement which can't be solved via Hive/Pig directly. *
>
> Any help will be greatly appreciated :)
>
> --
> Thanks
> Suraj Nayak M
>



-- 
Thanks
Suraj Nayak M

Mime
View raw message