Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1470EB9D for ; Thu, 7 Mar 2013 19:59:12 +0000 (UTC) Received: (qmail 28207 invoked by uid 500); 7 Mar 2013 19:59:12 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 28146 invoked by uid 500); 7 Mar 2013 19:59:12 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 28136 invoked by uid 500); 7 Mar 2013 19:59:11 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 28132 invoked by uid 99); 7 Mar 2013 19:59:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 19:59:11 +0000 Date: Thu, 7 Mar 2013 19:59:11 +0000 (UTC) From: "Ashutosh Chauhan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4108) Allow over() clause to contain an order by with no partition by MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596280#comment-13596280 ] Ashutosh Chauhan commented on HIVE-4108: ---------------------------------------- I think we should remove the concept of inference from query level distribute/sort. For your first query I will read that as user first intends to do a partition on a constant using full table (which will be first MR job) and than wants second partitioning on x (2nd MR job) which as you pointed out is different than current behavior. For your second query, my read will be same as previous which again deviates from implementation. For third query, same ambiguity. So, in all 3 cases current behavior is different than what I would have expected. Automatic inference is nasty. IMO we should drop it all together. Distribute/Sort if present in query shouldn't impact any over() clause specified in the query. Whenever they are present that will just imply user wants another MR job using that spec (which was the behavior in HIVE before this work). > Allow over() clause to contain an order by with no partition by > --------------------------------------------------------------- > > Key: HIVE-4108 > URL: https://issues.apache.org/jira/browse/HIVE-4108 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing > Reporter: Brock Noland > > HIVE-4073 allows over() to be called with no partition by and no order by. We should allow only an order by. > From the review of HIVE-4073: > Ashutosh > {noformat} > Can you also add following test. This should also work. > select p_name, p_retailprice, > avg(p_retailprice) over(order by p_name) > from part > partition by p_name; > {noformat} > Harish > {noformat} > This test will not work (: > The grammar needs to be changed so: > partitioningSpec > @init { msgs.push("partitioningSpec clause"); } > @after { msgs.pop(); } > : > partitionByClause orderByClause? -> ^(TOK_PARTITIONINGSPEC partitionByClause orderByClause?) | > orderByClause -> ^(TOK_PARTITIONINGSPEC orderByClause) | > distributeByClause sortByClause? -> ^(TOK_PARTITIONINGSPEC distributeByClause sortByClause?) | > sortByClause? -> ^(TOK_PARTITIONINGSPEC sortByClause) | > clusterByClause -> ^(TOK_PARTITIONINGSPEC clusterByClause) > ; > And the SemanticAnalyzer::processPTFPartitionSpec has to handle this shape of the AST Tree. The PTFTranslator also needs changes. Do this as another Jira > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira