hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Nayak <snay...@gmail.com>
Subject Re: Reading 2 table data in MapReduce for Performing Join
Date Thu, 19 Mar 2015 16:17:58 GMT
Hi All,

I was successfully able to integrate HCatMultipleInputs with the patch for
the tables created with TEXTFILE. But I get error when I read table created
with ORC file. The error is below :

15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
attempt_1425012118520_9756_m_000000_0, Status : FAILED
Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
cannot be cast to org.apache.hadoop.io.LongWritable
    at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


Can anyone help?

Thanks in advance!

On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak <snayakm@gmail.com> wrote:

> Hi All,
>
> https://issues.apache.org/jira/browse/HIVE-4997 patch helped!
>
>
> On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak <snayakm@gmail.com> wrote:
>
>> Hi,
>>
>> I tried reading data via HCatalog for 1 Hive table in MapReduce using
>> something similar to
>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
>> I was able to read successfully.
>>
>> Now am trying to read 2 tables, as the requirement is to join 2 tables. I
>> did not find API similar to *FileInputFormat.addInputPaths* in
>> *HCatInputFormat*. What is the equivalent of the same in HCat ?
>>
>> I had performed join using FilesInputFormat in HDFS(by getting split
>> information in mapper). This article(
>> http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
>> <http://www.codingjunkie.com/mapreduce-reduce-joins/> Can someone
>> suggest how I can perform join operation using HCatalog ?
>>
>> Briefly, the aim is to
>>
>>    - Read 2 tables (almost similar schema)
>>    - If key exists in both the table send it to same reducer.
>>    - Do some processing on the records in reducer.
>>    - Save the output into file/Hive table.
>>
>> *P.S : The reason for using MapReduce to perform join is because of
>> complex requirement which can't be solved via Hive/Pig directly. *
>>
>> Any help will be greatly appreciated :)
>>
>> --
>> Thanks
>> Suraj Nayak M
>>
>
>
>
> --
> Thanks
> Suraj Nayak M
>



-- 
Thanks
Suraj Nayak M

Mime
View raw message