hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: How can i realize the “count(distinct )” function in hive ?
Date Mon, 13 Dec 2010 16:26:48 GMT
You don't really need to store all incoming keys. If the input comes
sorted, you can rely on matching every two values and incrementing the
count accordingly (If you do it in the reduce side it comes sorted by
the key, so for non-distinct keys, you would have more than one value;
thus all you need to do is count all reduce calls as the grouping does
the rest). Just a suggestion to avoid possible memory issues. Correct
me if am wrong, please.

On Mon, Dec 13, 2010 at 5:36 PM, 1983 ddi <ddi6666@gmail.com> wrote:
> by I  am  confused about how can I write the UDAF class, is there anybody
> who can give me a favor and thanks a lot if there is an example .

About UDFs, read this developer article at Bizo that covers it well
enough: http://dev.bizo.com/2009/06/custom-udfs-and-hive.html

-- 
Harsh J
www.harshj.com

Mime
View raw message