hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qing Yan <qing...@gmail.com>
Subject Re: Combine() optimization
Date Fri, 27 Feb 2009 01:58:03 GMT
Got it.

Does map side aggregation has any special requirement about the dataset?
E.g. The number of unqiue group by keys could be too big to hold
in memory. Will it still work?

On Fri, Feb 27, 2009 at 5:50 AM, Zheng Shao <zshao9@gmail.com> wrote:

> Hi Qing,
>
> We did think about Combiner when we started Hive. However earlier
> discussions lead us to believe that hash-based aggregation inside the mapper
> will be as competitive as using combiner in most cases.
>
> In order to enable map-side aggregation, we just need to do the following
> before running the hive query:
> set hive.map.aggr=true;
>
> Zheng
>
>
> On Thu, Feb 26, 2009 at 6:03 AM, Raghu Murthy <raghu@facebook.com> wrote:
>
>> Right now Hive does not exploit the combiner. But hash-based map-side
>> aggregation in hive (controlled by hints) provides a similar optimization.
>> Using the combiner in addition to map-side aggregation should improve the
>> performance even more if the combiner can further aggregate the partial
>> aggregates generated from the mapper.
>>
>>
>> On 2/26/09 5:57 AM, "Qing Yan" <qingyan@gmail.com> wrote:
>>
>> > Is there any way/plan for Hive to take advantage of M/R's combine()
>> > phrase? There can be either rules embedded in in the query optimizer  or
>> hints
>> > passed by user...
>> > GROUP BY should benefit from this alot..
>> >
>> > Any comment?
>> >
>> >
>> >
>>
>>
>
>
> --
> Yours,
> Zheng
>

Mime
View raw message