Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Thu, 7 Mar 2013 19:59:11 +0000 (UTC)
From: "Ashutosh Chauhan (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12634971.1362239161988.402476.1362686351883@arcas>
In-Reply-To: <JIRA.12634971.1362239161988@arcas>
References: <JIRA.12634971.1362239161988@arcas>
Subject: [jira] [Commented] (HIVE-4108) Allow over() clause to contain an
 order by with no partition by
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596280#comment-13596280 ] 

Ashutosh Chauhan commented on HIVE-4108:
----------------------------------------

I think we should remove the concept of inference from query level distribute/sort.
For your first query I will read that as user first intends to do a partition on a constant using full table (which will be first MR job) and than wants second partitioning on x (2nd MR job) which as you pointed out is different than current behavior.
For your second query, my read will be same as previous which again deviates from implementation.
For third query, same ambiguity.

So, in all 3 cases current behavior is different than what I would have expected. Automatic inference is nasty. IMO we should drop it all together. Distribute/Sort if present in query shouldn't impact any over() clause specified in the query. Whenever they are present that will just imply user wants another MR job using that spec (which was the behavior in HIVE before this work).
                
> Allow over() clause to contain an order by with no partition by
> ---------------------------------------------------------------
>
>                 Key: HIVE-4108
>                 URL: https://issues.apache.org/jira/browse/HIVE-4108
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>            Reporter: Brock Noland
>
> HIVE-4073 allows over() to be called with no partition by and no order by. We should allow only an order by.
> From the review of HIVE-4073:
> Ashutosh
> {noformat}
> Can you also add following test. This should also work.
> select p_name, p_retailprice,
> avg(p_retailprice) over(order by p_name)
> from part
> partition by p_name;
> {noformat}
> Harish
> {noformat}
> This test will not work (:
> The grammar needs to be changed so:
> partitioningSpec
> @init { msgs.push("partitioningSpec clause"); }
> @after { msgs.pop(); } 
> :
> partitionByClause orderByClause? -> ^(TOK_PARTITIONINGSPEC partitionByClause orderByClause?) |
> orderByClause -> ^(TOK_PARTITIONINGSPEC orderByClause) |
> distributeByClause sortByClause? -> ^(TOK_PARTITIONINGSPEC distributeByClause sortByClause?) |
> sortByClause? -> ^(TOK_PARTITIONINGSPEC sortByClause) |
> clusterByClause -> ^(TOK_PARTITIONINGSPEC clusterByClause)
> ;
> And the SemanticAnalyzer::processPTFPartitionSpec has to handle this shape of the AST Tree. The PTFTranslator also needs changes. Do this as another Jira
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira