Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 68F14119DD for ; Fri, 12 Sep 2014 04:54:34 +0000 (UTC) Received: (qmail 56034 invoked by uid 500); 12 Sep 2014 04:54:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 55969 invoked by uid 500); 12 Sep 2014 04:54:33 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 55958 invoked by uid 500); 12 Sep 2014 04:54:33 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 55955 invoked by uid 99); 12 Sep 2014 04:54:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Sep 2014 04:54:33 +0000 Date: Fri, 12 Sep 2014 04:54:33 +0000 (UTC) From: "Hive QA (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131099#comment-14131099 ] Hive QA commented on HIVE-7156: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668171/HIVE-7156.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6198 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/752/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/752/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-752/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12668171 > Group-By operator stat-annotation only uses distinct approx to generate rollups > ------------------------------------------------------------------------------- > > Key: HIVE-7156 > URL: https://issues.apache.org/jira/browse/HIVE-7156 > Project: Hive > Issue Type: Sub-task > Affects Versions: 0.14.0 > Reporter: Gopal V > Assignee: Prasanth J > Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch > > > The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values. > The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism. > {code} > hive> explain select distinct L_SHIPDATE from lineitem; > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: lineitem > Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: l_shipdate (type: string) > outputColumnNames: l_shipdate > Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: l_shipdate (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)