hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Running and Sliding on Aggregates
Date Wed, 17 Nov 2010 01:02:59 GMT
Hey Michael,

I unfortunately can't tell you how to get around this limitation, but I can
point you to the JIRA issue that attempts to capture the requirements:
https://issues.apache.org/jira/browse/HIVE-896. If you could describe your
use case there, we can capture it for posterity in hope that some future
Hive engineer will find time to crank out a patch.

Thanks,
Jeff

On Tue, Nov 16, 2010 at 12:49 PM, Michael Roessler <
michael.roessler@keyevent.com> wrote:

> I suffer from "sql-itis", especially the SQL:2003 variant. I can barely get
> a data-related programming line out of my fingers that doesn't include a
> "partition by" or an "over order by".
>
> Many of the data sets I analyze can be simplified for comprehension by
> thinking of them as having only three columns: a descriptive columns
> (ENTITY_NAME), a date column (20101611), and a numeric or double value
> column (456.789). A time series. Much of what I do is partition the
> descriptive column, order by the date column, and apply a moving aggregate
> (like average) over a sliding window of the value column.
>
> Example:
>
> select entity_name, value_date, avg(value) over (partition by entity_name
> order by value_date rows between 20 preceding and current row) as
> twenty_period_moving_average
> from table_with_a_billion_rows
>
> I am aware that HQL does not currently include this type of SQL analytic
> function. What are my options, if any, to accomplish the above with Hive?
> I'm looking at UDAF, but I haven't yet grokked.
>
> Thanks,
>
> Michael
>
> Michael Roessler, CFA
> michael.roessler@keyevent.com
>
>

Mime
View raw message