hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <>
Subject Re: Custom Mapper and Reducer vs HiveQL in terms of Performance
Date Fri, 13 Jul 2012 00:57:06 GMT

There is no need to implement a custom mapper or reducer. If you are
experiencing issues with performance you might consider to use bucketized
tables and do a bucketed map join/ sorted merge map join. A good example of
performance in joins can be found in this slide from Facebook:
basically you need to choose a good strategy depending on your data.


Cloudera, Inc.

On Thu, Jul 12, 2012 at 2:18 PM, Raihan Jamal <> wrote:

> Sending it again. As I haven't got any reply on this. Any personal
> experience will be appreciated.
> *Raihan Jamal*
> On Mon, Jul 9, 2012 at 3:37 PM, Raihan Jamal <>wrote:
>>  *Problem Statement:-*
>> I need to compare two tables Table1 and Table2 and they both store same
>> thing. So I need to compare Table2 with Table1 as Table1 is the main
>> table through which comparisons need to be made. So after comparing I need
>> to make a report that Table2 has some sort of discrepancy. And these two
>> tables has lots of data, around TB of data. So currently I have written
>> HiveQL to do the comparisons and get the data back.
>> So my question is which is better in terms of PERFORMANCE, writing a CUSTOM
>> MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will
>> be fine as I will be joining these two tables on millions of records. As
>> far as I know HiveQL internally (behind the scenes) generates optimized
>> custom map-reducer and submits for execution and gets back the results.
>> *Raihan Jamal*

View raw message