hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prajakta Kalmegh (JIRA)" <>
Subject [jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
Date Thu, 31 Jan 2013 17:39:13 GMT


Prajakta Kalmegh commented on HIVE-896:

This is not exactly a bug. In the existing trunk, the ExtractOperator is followed by a FileSinkOperator
and hence does not have this problem. For queries like below:

select p1.p_mfgr, p1.p_name, 
from part p1 join part p2 on p1.p_partkey = p2.p_partkey 
distribute by p1.p_mfgr 
sort by p1.p_name;

a SelectOperator after JoinOperator solves this problem by filtering the virtual columns (VCs)
and setting up a correct RR for ReduceSinkOperator. We cannot insert a SelectOperator in our
case as the PTF chain is a black-box for us. 

In queries with the PTFOperator, we use the RowResolver of the ExtractOperator to construct
ExprNodeDescs during translation. The problem here is: if we do not filter out the VCs from
the ExtractOperator and use them during translation, the ColumnPrunerTableScanProc adds these
VCs in the newVirtualCols List. This causes a non-empty virtualCols on TableScanDesc. During
runtime, in the MapOperator the 'hasVC' boolean is set to true eventually resulting in a ClassCastException
in ReduceSinkOperator during row evaluation. This problem occurs particularly for queries
involving join with PTF (We can walk through some examples offline to explain why this is
not a problem for queries with a PTF and no join). So currently, we are filtering the VCs
and setting up a new RowResolver for ExtractOperator during translation so that the columns
at runtime match with those during translation. 
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>                 Key: HIVE-896
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>         Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt
> Windowing functions are very useful for click stream processing and similar time-series/sliding-window
> More details at:
> -- amr

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message