hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huan Li <yrlih...@gmail.com>
Subject Re: UDAF and group by
Date Mon, 05 Sep 2011 03:10:27 GMT
Setting hive.map.aggr false will reduce the chance of terminatePartial() and
merge() being called. Though I don't think it will eliminate the
possibility. If your data is large, it's still possible that a group of data
is processed by multiple reducers and those two methods are needed.

If you need to process records in each group in a single method, you can
first use collect_set to collect your group data and process them in a UDF.

2011/9/4 Koert Kuipers <koert@tresata.com>

> Hey, my question wasn't very clear. I have a UDAF that I apply per group.
> The UDAF does not support terminatePartial() and merge(). So to do this i
> run:
>
> set hive.map.aggr=false;
> select myUdf(col1, col2) from table group by col3;
>
> Now this seems to work. But are my assumptions correct that this will never
> call terminatePartial() or merge()?
> Thanks Koert
>
>
> On Thu, Sep 1, 2011 at 11:59 PM, Huan Li <yrlihuan@gmail.com> wrote:
>
>> Koert, Not sure what you mean by "results can be merged between groups".
>> UDAF should be used to aggregated records by group. Why need to merge
>> between groups?
>>
>> Can you give some examples of what kind of query you'd like to run?
>>
>>
>> 2011/8/30 Koert Kuipers <koert@tresata.com>
>>
>>> If i run my own UDAF with group by, can i be sure that a single UDAF
>>> instance initialized once will process all members in a group? Or should i
>>> code so as to take into account the situation where even within a group
>>> multiple UDAFs could run, and i would have to deal with terminatePartial()
>>> and merge() even within a group?
>>> My problem is that my results within a group are not easily merged, but
>>> between groups they are.
>>>
>>
>>
>

Mime
View raw message