hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4080) Add Lead & Lag UDAFs
Date Wed, 27 Feb 2013 08:23:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588116#comment-13588116
] 

Ashutosh Chauhan commented on HIVE-4080:
----------------------------------------

bq.  Support for this feature will probably be removed. Causes ambiguities when Query contains
different partition clauses. 
Do you mean feature which this patch is introducing (ability to have lead function independent
of UDAFs in select expr) will be removed? Consider following query:
{noformat}
 select p_mfgr, p_retailprice,
lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name),
lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name),
p_retailprice - lead(p_retailprice,1) 
from part;
{noformat}
My guess is ambiguity you are referring to is once we start supporting different partitioning
in same query than last lead() in above query becomes ambiguous as to which partitioning function
it is refering to. But my understanding is sql standard says that lead and lag function must
always be associated with over clause. So, above query is illegal in standard sql. It must
be written as:
{noformat}
 select p_mfgr, p_retailprice,
lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name),
lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name),
p_retailprice - lead(p_retailprice,1)as l3 over (partition by p_size order by p_name) 
from part;
{noformat} 
Now we have this concept of default partitioning which would have made first query legal if
partitioning scheme was identical for l1 and l2. I think long term:
* We should keep functionality introduced in this patch to stay compliant.
* Associate default partitioning with windowing function only if there is no ambiguity (i.e.,
there is only one partitioning clause in query).
* Raise error if user doesn't specify partitioning and there are more than one partitioning
scheme to choose from.
Same argument stands for when lead/lag functions are used as arguments with UDAFs. Make sense
?

Further, I think this concept of default partitioning is only extra convenience we are offering
to hive users which is non-standard. If it turns out its burdensome to support this I am fine
with removing it and always requiring user to specify over clause.
                
> Add Lead & Lag UDAFs
> --------------------
>
>                 Key: HIVE-4080
>                 URL: https://issues.apache.org/jira/browse/HIVE-4080
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Harish Butani
>            Assignee: Harish Butani
>         Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch
>
>
> Currently we support Lead/Lag as navigation UDFs usable with Windowing.
> To be standard compliant we need to support Lead & Lag UDAFs.
> Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play.

> Currently allow Lead/Lag expressions to appear in SelectLists even when they are not
arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities
when Query contains different partition clauses. Will provide more details with associated
Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message