hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raihan Jamal <>
Subject Custom Mapper and Reducer vs HiveQL in terms of Performance
Date Mon, 09 Jul 2012 22:37:23 GMT
*Problem Statement:-*

I need to compare two tables Table1 and Table2 and they both store same
thing. So I need to compare Table2 with Table1 as Table1 is the main table
through which comparisons need to be made. So after comparing I need to
make a report that Table2 has some sort of discrepancy. And these two
tables has lots of data, around TB of data. So currently I have written
HiveQL to do the comparisons and get the data back.

So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will be
fine as I will be joining these two tables on millions of records. As far
as I know HiveQL internally (behind the scenes) generates optimized custom
map-reducer and submits for execution and gets back the results.

*Raihan Jamal*

View raw message