hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <>
Subject [jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
Date Sat, 19 Jan 2013 17:42:15 GMT


Harish Butani commented on HIVE-896:

This is only true as long as you have only one OVER clause, right? As soon as you add the
ability to have separate OVER clauses partitioning by different keys (which users will want
very soon) you lose this identity. Even if you decide to retain this I would argue that the
standard PARTITION BY/ORDER BY syntax should be accepted as well. HQL already has enough one
off syntax that makes life hard for people coming from more standard SQL. It should not be

I am agnostic about the second point. We can support Partition/Order or Distribute/Sort or

Regarding the first point, no it applies beyond having the same partitioning. If you say something

select p_mfgr, p_name,
 sum(p_size) over (distribute by p_mfgr sort by p_name rows between unbounded preceding and
current row)
from part
distribute by p_mfgr
sort by p_name;

This is allowed in the language; if we don't relate windowing to the distribute/sort, this
means do the Windowing and then do the distribute/sort. I doubt this would ever be what the
user intended. So we propose to associate the distribute/sort with windowing; and use it at
as the 'default' partitioning specification. So we allow a Query to be specified this way:

select p_mfgr, p_name,
sum(p_size) over (rows between unbounded preceding and current row)
from part
distribute by p_mfgr
sort by p_name;

The concept of inheriting the default partitioning still works even when we allow different
partitioning specifications. So in the future this will be how you specify multiple ordering:

select p_mfgr, p_name,
 sum(p_size) over ( rows between unbounded preceding and current row),
 sum(p_retailprice) (sort by p_size rows between unbounded preceding and current row)
from part
distribute by p_mfgr
sort by p_name;

Similarly you can have multiple distribute specs.

This is part of where I was going with my earlier question on why a windowing function would
ever return a partition. I am becoming less convinced that it makes sense to combine windowing
and partition functions. While they both take partitions as inputs they return different things.
Partition functions return partitions and windowing functions return a single value. As you
point out here the partition functions will also not be interested in the range limiting features
of windowing functions. But taking advantage of this in windowing functions will be very important
for performance optimizations, I suspect. At the very least it seems like partitioning functions
and windowing functions should be presented as separate entities to users and UDF writers,
even if for now Hive shares some of the framework for handling them underneath. This way in
the future optimizations and new features can be added in a way that is advantageous for each.

There are several points here.

a. Windowing doesn't return a single value. The output of applying a WindowFunction on a Partition
is a Column with the same number of rows as the partition.
b. The combined output of all the WIndow Functions in a Statement is a Partition that combines
output from the individual Wdw Functions.

Now let me say something about the seperation of Windowing and PTF functionality. There are
4 areas:

a. The Language level: there is no intersection. The user will not see the connection. One
is used as a UDAF; the other whereever tables can appear.

b. The Ifc/Function writer level: there is no intersection. There is no 'Window Function';
a UDAF writer can continue to write UDAFs. They automatically become available in Window expressions.
Table Functions are written using the TableFunctionResolver and Evaluator ifcs. This is very
different from writing a UDAF. (We have setup a functions branch; you will see some egs of
TblFuncs, past the pedantic Noop and NoopWithMap; also we were hoping to add NPath into the
first patch).

c. The Query Specification Level: Here things get a little messy. First let me describe the
situation, and then how it is relatively eay to fix. We have extended the QueryBlock(QB) to
have the following information(per destination):
- a map of PTF invocations. These are any PTF invocations that appear in the from clause.
Roughly equivalent to how SubQuery information is captured in the QB. This information is
held in an instance of PTFSpec; which captures all the details of the PTF invocation.
-  a destination may also have a PTFSpec attached which represents the Windowing processing
associated with this QB destination.
Here is where things need correction. Our implementation details are leaking into the Specification
classes. Since today we use the PTFOp to execute windowing we are capturing the windowing
clauses information in a PTFSpec. But it is relatively easy to correct this. We can have a
different set of classes to capture the Window processing.

c.2. The other place where the implemenation is leaking is how we try to optimize Windowing
processing when it is combined  with a PTF invocation. (this may be too much detail; the gist
of this point is that this too is easily fixable; if not interested skip to point d.) 

When we see that the from clause is only a PTF invocation then we associate the windowing
clauses with its PTFSpec; so as to treat both things as one PTF Chain. Once in a chain we
use our PTF Chain breaking logic to decide whether Windowing can be done in the same PTFOp
for we need to break them. Again this is relatively easy to fix; for now we remove the logic
that trys to associate a Windowing processing with an existing PTFSpec on the QB. This keeps
the translation clean; the decision to combine can be pushed off to a later stage.

d. The execution of Windowing: think of PTFOperator as an implementaion for Windowing. With
the changes above, it will be easy to choose other available implementaions in the future.

Finally the execution of Windowing by the PTFOp has some uses:
- the handling of value based ranges: it is more work to predict the boundary of the window;
and sometimes it may just make sense to keep the whole partition. For e.g.:

select  p_mfgr,p_name, p_size,  
sum(p_size) as s2 over (range between p_size 5 less and current row), 
from part 
distribute by p_mfgr 
sort by p_mfgr, p_name;

The tradeoff is doing the window calculation for each row to decide how much to keep around
vs just keeping the whole thing(spilled to disk if needed)

The support for multiple ordering, with the same partition is possible by doing just one shuffle
and then doing a sort of the PersistentPartitionedList backing the Partition. This is much
easier to support then to invoke multiple MR jobs and assemble the output into a final Partition.

> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>                 Key: HIVE-896
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>         Attachments: HIVE-896.1.patch.txt
> Windowing functions are very useful for click stream processing and similar time-series/sliding-window
> More details at:
> -- amr

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message