cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " Brian Hess (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9778) CQL support for time series aggregation
Date Tue, 19 Jan 2016 20:52:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107407#comment-15107407
] 

 Brian Hess commented on CASSANDRA-9778:
----------------------------------------

[~blerer] - that really isn't an example of window functions (A/K/A window aggregates, window
aggregate functions, etc).  That's really an example of a grouped aggregate with time functions
(Floor, Minute, Hour, etc).  The cardinality of the output of this query is that the number
of rows equals the number of groups.  Whereas, with window functions the cardinality of the
output is that the number of rows equals the number of rows of input.

Let me simplify your trades example to daily stock prices with a schema of (symbol TEXT, transDate
DATE, closePrice DOUBLE).  For each stock you'd like the sliding 3-day average of the stock
closing prices.  You would do that with the following SQL-99 syntax:
SELECT symbol, transDate, closePrice, Avg(closePrice) OVER (PARTITION BY symbol ORDER BY closePrice
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS threeDayAverage FROM stocks WHERE symbol = 'XYZ';

Here, each day will have a "window" of rows that stretches from 2 rows before (if they exist)
to the current row, and the value is the average of the three closePrice values.  Thus, there
is an output for every row of input.  For Thursday's threeDayAverage for stock XYZ we will
use the closePrice from Tuesday, Wednesday, and Thursday.  For Friday's threeDayAverage for
stock XYZ we will use the closePrice from Wednesday, Thursday, and Friday.  And so on.

For what it's worth, this is not hard to do if there is a partition key predicate.  We are
simply doing a pass over the rows to return to the client and rolling things up.  It is possible
we need to sort the data depending on the ORDER BY clause, but otherwise the aggregation is
a simple rollup.  It should be noted that SQL allows for very flexible window specifications
that can cause trouble, such as
"OVER (PARTITION BY symbol ORDER BY transDate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)"
which would go from the current row to the end of the partition.  That can be a tricky case.
 SQL99 also supports RANGE window specifications in addition to ROW specifications.  That
can also be tricky.

That said, window functions would be a nice addition (especially with a partition key predicate).


> CQL support for time series aggregation
> ---------------------------------------
>
>                 Key: CASSANDRA-9778
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9778
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>            Assignee: Benjamin Lerer
>             Fix For: 3.x
>
>
> Along with MV (CASSANDRA-6477), time series aggregation or "rollups" are a common design
pattern in cassandra applications.  I'd like to add CQL support for this along these lines:
> {code}
> CREATE MATERIALIZED VIEW stocks_by_hour AS
> SELECT exchange, day, day_time(1h) AS hour, symbol, avg(price), sum(volume)
> FROM stocks
> GROUP BY exchange, day, symbol, hour
> PRIMARY KEY  ((exchange, day), hour, symbol);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message