Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B895F432 for ; Tue, 23 Apr 2013 05:49:25 +0000 (UTC) Received: (qmail 46330 invoked by uid 500); 23 Apr 2013 05:49:24 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 45925 invoked by uid 500); 23 Apr 2013 05:49:19 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 45855 invoked by uid 500); 23 Apr 2013 05:49:17 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 45828 invoked by uid 99); 23 Apr 2013 05:49:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2013 05:49:17 +0000 Date: Tue, 23 Apr 2013 05:49:16 +0000 (UTC) From: "Phabricator (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4377?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13638= 808#comment-13638808 ]=20 Phabricator commented on HIVE-4377: ----------------------------------- navis has commented on the revision "HIVE-4377 [jira] Add more comment to h= ttps://reviews.facebook.net/D1209 (HIVE-2340)". INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:209 It was implemented as your suggestion at first but it was very conf= using with many redundant codes(There are seven possible cases sharing comm= on rule). But if you prefer, I'll update patch. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:251 processOrderBy? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:254 ProcessGroupBy? ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:264 Will be moved to JoinReducerProc. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:512 I was thinking of OperatorUtils or someting. Methods like this woul= d be made continuously. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:685 can be null if there exists operator like ScriptOperator between tw= o RSs. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:690 If there is difference in key/partition/sort-order in common part o= f two RSs, it's not possible to merge. I'll add comment for that. ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.j= ava:395 I'll try. REVISION DETAIL https://reviews.facebook.net/D10377 To: JIRA, navis Cc: njain =20 > Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) > ------------------------------------------------------------------ > > Key: HIVE-4377 > URL: https://issues.apache.org/jira/browse/HIVE-4377 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Gang Tim Liu > Assignee: Navis > Attachments: HIVE-4377.D10377.1.patch > > > thanks a lot for addressing optimization in HIVE-2340. Awesome! > Since we are developing at a very fast pace, it would be really useful to > think about maintainability and testing of the large codebase. Highlights= which are applicable for D1209: > 1. Javadoc for all public/private functions, except for > setters/getters. For any complex function, clear examples (input/output) > would really help. > 2. Specially, for query optimizations, it might be a good idea to have > a simple working query at the top, and the expected changes. For e.g.. > The operator tree for that query at each step, or a detailed explanation > at the top. > 3. If possible, the test name (.q file) where the function is being > invoked, or the query which would potentially test that scenario, if it > is a query processor change. > 4. Comments in each test (.q file)=C2=AD that should include the jira > number, what is it trying to test. Assumptions about each query. > 5. Reduce the output for each test =C2=AD whenever query is outputting= more > than 10 results, it should have a reason. Otherwise, each query result > should be bounded by 10 rows. > thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira