hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Zhou (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-607) Create statistical UDFs.
Date Wed, 29 Jul 2009 06:50:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736475#action_12736475

Min Zhou commented on HIVE-607:

sorry, some typo

I've implemented group_cat() in a rush, and found something difficult to slove:
1. function group_cat() has a internal order by clause, currently, we can't implement such
an aggregation in hive.
2. when the strings will be group concated are too large, in another words, if data skew appears,
 there is ofen not enough memory to store such a big result.

> Create statistical UDFs.
> ------------------------
>                 Key: HIVE-607
>                 URL: https://issues.apache.org/jira/browse/HIVE-607
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: S. Alex Smith
>            Assignee: Emil Ibrishimov
>            Priority: Minor
>             Fix For: 0.4.0
>         Attachments: HIVE-607.1.patch, UDAFStddev.java
> Create UDFs replicating:
> STD() 	Return the population standard deviation
> STDDEV_POP()(v5.0.3) 	Return the population standard deviation
> STDDEV_SAMP()(v5.0.3) 	Return the sample standard deviation
> STDDEV() 	Return the population standard deviation
> SUM() 	Return the sum
> VAR_POP()(v5.0.3) 	Return the population standard variance
> VAR_SAMP()(v5.0.3) 	Return the sample variance
> VARIANCE()(v4.1) 	Return the population standard variance
> as found at http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message