drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject how should window function works
Date Thu, 11 Dec 2014 23:52:19 GMT
as part of DRILL-1487 <https://issues.apache.org/jira/browse/DRILL-1487>, I
did some research about sql window functions. I am posting here what I've
found so far, to make sure I am getting everything right, and to discuss
how we can fix the current implementation.

a window function performs a calculation across a set of table rows that
are somehow related to the current row. It's similar to aggregate
functions, but doesn't cause rows to be grouped into a single output row.
The over clause determines exactly how the rows of the query are split up
for processing by the window function.

For each row, there is a set of rows within it's partition called it's
"window frame". Many (but not all) window functions act only on the rows of
the window frame, rather than of the whole partition. All aggregate
functions act on the rows of the window frame.

the over clause can contain a "partition by" clause, an "order by" clause
and a frame clause. When "partition by" is omitted there is one single
partition that contains all rows. If "order by" is supplied then the frame
consists of all rows from the start of the partition up through the current
row, plus any following rows that are equal to the current row according to
the "order by" clause. When "order by" is omitted, the default frame
consists of all rows in the partition. The frame clause specifies the set
of rows constituting the window frame.

DRILL-1487 deals with a specific case: which rows belong to the window
frame when an "order by" clause is available. But there are several other
problems related to window functions in Drill:

1/ omitting "partition by", "order by" or both throws an exception.

2/ using a different field in "partition by" and "order by" doesn't produce
the expected results. For example, the following query should display the
rows sorted by position_id and salary, but the result are only sorted by

SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
position_id ORDER BY salary) FROM cp.`employee.json` LIMIT 20;

3/ frame clauses are ignored, for example the following query doesn't work
as expected ("ROWS CURRENT ROW" has no effect):

SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
position_id ORDER BY position_id ROWS CURRENT ROW) FROM cp.`employee.json`

4/ starting from Drill 0.7.0 window functions throw an exception (see
DRILL-1844 <https://issues.apache.org/jira/browse/DRILL-1844>)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message