Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 92F1C2009D9 for ; Thu, 19 May 2016 19:27:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8FF3A160A00; Thu, 19 May 2016 17:27:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D78A71609AE for ; Thu, 19 May 2016 19:27:37 +0200 (CEST) Received: (qmail 72741 invoked by uid 500); 19 May 2016 17:27:36 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 72725 invoked by uid 99); 19 May 2016 17:27:36 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 May 2016 17:27:36 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 3C1902C14F2; Thu, 19 May 2016 17:27:36 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============7366652986522167493==" MIME-Version: 1.0 Subject: Re: Review Request 47554: HIVE-13750 From: Ashutosh Chauhan To: Ashutosh Chauhan Cc: hive , =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez?= Date: Thu, 19 May 2016 17:27:36 -0000 Message-ID: <20160519172736.1677.13334@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Ashutosh Chauhan X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/47554/ X-Sender: Ashutosh Chauhan References: <20160519000645.1677.23480@reviews.apache.org> In-Reply-To: <20160519000645.1677.23480@reviews.apache.org> Reply-To: Ashutosh Chauhan X-ReviewRequest-Repository: hive-git archived-at: Thu, 19 May 2016 17:27:38 -0000 --===============7366652986522167493== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit > On May 19, 2016, 12:06 a.m., Ashutosh Chauhan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java, line 556 > > > > > > Can you also add comment why this is valid only for RS added by SPDO and not in general? > > Jesús Camacho Rodríguez wrote: > I was thinking further about this, and indeed I made many assumptions thinking that SPDO will not add the sort columns at the end of the additional RS keys, so then we need to make sure the ordering is correct... but this is not true, as it will add them. > > I think this optimization can be done general, as ordering before ordering again is a noop. I have changed the code accordingly, I would like your opinion. Am I missing something? Let's see what QA returns... > > Ashutosh Chauhan wrote: > yeah.. thats what I think.. this is not specific to SPDO at logical layer. > Thing I am not 100% sure about is in FileSink the way SPDO is implemented I *think* it assumes all rows for a particular key come in sorted and in one batch. Now, when we merge and sort by two keys and if there is a case that FS was expecting all rows sorted on second key only, now will get them sorted on 2 keys and it may have to close ORC writer and then reopen again, as oppose to write it once and never opening it again. > > Jesús Camacho Rodríguez wrote: > I am trying to understand this part... The way the optimization works, we will keep the second RS with its keys, we just remove the first RS... Thus, this should not be a problem? Or maybe I am not understanding it correctly > > Ashutosh Chauhan wrote: > I think you are correct. Since relevant FS is active in last reducer containing RS which we is not changing, we are not changing any assumptions for FS. > > Ashutosh Chauhan wrote: > On a second thought I think there is more to it. e.g. we are transforming RS (a) followed by RS (a,b) to RS(a,b) Than this is not valid transformation in general, only if second RS is introduced by SPDO. e.g, if these two RS are part of two GBYs (or a join & GBy), than its invalid because partitioning on (a,b) != partitioning on (a) > > Jesús Camacho Rodríguez wrote: > But aggressiveDedup is really restrictive: only pRS-SEL*-cRS, thus we would bail out in those specific cases. Makes sense. Thanks for the explaination! - Ashutosh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/47554/#review133844 ----------------------------------------------------------- On May 19, 2016, 10:49 a.m., Jesús Camacho Rodríguez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/47554/ > ----------------------------------------------------------- > > (Updated May 19, 2016, 10:49 a.m.) > > > Review request for hive and Ashutosh Chauhan. > > > Bugs: HIVE-13750 > https://issues.apache.org/jira/browse/HIVE-13750 > > > Repository: hive-git > > > Description > ------- > > HIVE-13750 > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 010c89ed978296709b052cc7bc80256a27658e2b > ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java 733620b84657a21829248afe72ab16ad9692f37e > ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out d03bfe422743d9a5a6b85f9a6198e1e27024f129 > ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out dec872ab0eef54bd92d5c2bc068e2805cc14e272 > ql/src/test/results/clientpositive/dynpart_sort_optimization_acid.q.out 832580325873dee741ba86239ee571873994a808 > ql/src/test/results/clientpositive/reducesink_dedup.q.out b89df52f965385b85894757896eee487b29c52ae > ql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out a90e3f63b4646cf0ade9785a501ebd1a6b2a3406 > ql/src/test/results/clientpositive/tez/dynpart_sort_optimization.q.out 723e8192f2735059005fc3c5c96732a2c4be49c1 > > Diff: https://reviews.apache.org/r/47554/diff/ > > > Testing > ------- > > > Thanks, > > Jesús Camacho Rodríguez > > --===============7366652986522167493==--