hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prajakta Kalmegh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
Date Thu, 31 Jan 2013 17:39:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567856#comment-13567856
] 

Prajakta Kalmegh commented on HIVE-896:
---------------------------------------

This is not exactly a bug. In the existing trunk, the ExtractOperator is followed by a FileSinkOperator
and hence does not have this problem. For queries like below:

select p1.p_mfgr, p1.p_name, 
p1.p_size 
from part p1 join part p2 on p1.p_partkey = p2.p_partkey 
distribute by p1.p_mfgr 
sort by p1.p_name;

a SelectOperator after JoinOperator solves this problem by filtering the virtual columns (VCs)
and setting up a correct RR for ReduceSinkOperator. We cannot insert a SelectOperator in our
case as the PTF chain is a black-box for us. 

In queries with the PTFOperator, we use the RowResolver of the ExtractOperator to construct
ExprNodeDescs during translation. The problem here is: if we do not filter out the VCs from
the ExtractOperator and use them during translation, the ColumnPrunerTableScanProc adds these
VCs in the newVirtualCols List. This causes a non-empty virtualCols on TableScanDesc. During
runtime, in the MapOperator the 'hasVC' boolean is set to true eventually resulting in a ClassCastException
in ReduceSinkOperator during row evaluation. This problem occurs particularly for queries
involving join with PTF (We can walk through some examples offline to explain why this is
not a problem for queries with a PTF and no join). So currently, we are filtering the VCs
and setting up a new RowResolver for ExtractOperator during translation so that the columns
at runtime match with those during translation. 
                
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
>                 Key: HIVE-896
>                 URL: https://issues.apache.org/jira/browse/HIVE-896
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>         Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, Hive-896.2.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar time-series/sliding-window
analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message