Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Fri, 12 Sep 2014 04:54:33 +0000 (UTC)
From: "Hive QA (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12717707.1401502681000.16112.1410497673805@Atlassian.JIRA>
In-Reply-To: <JIRA.12717707.1401502681000@Atlassian.JIRA>
References: <JIRA.12717707.1401502681000@Atlassian.JIRA>
 <JIRA.12717707.1401502681929@arcas>
Subject: [jira] [Commented] (HIVE-7156) Group-By operator stat-annotation
 only uses distinct approx to generate rollups
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131099#comment-14131099 ] 

Hive QA commented on HIVE-7156:
-------------------------------


{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668171/HIVE-7156.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6198 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/752/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/752/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-752/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668171

> Group-By operator stat-annotation only uses distinct approx to generate rollups
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-7156
>                 URL: https://issues.apache.org/jira/browse/HIVE-7156
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.14.0
>            Reporter: Gopal V
>            Assignee: Prasanth J
>         Attachments: HIVE-7156.1.patch, HIVE-7156.2.patch, HIVE-7156.3.patch
>
>
> The stats annotation for a group-by only annotates the reduce-side row-count with the distinct values.
> The map-side gets the row-count as the rows output instead of distinct * parallelism, while the reducer side gets the correct parallelism.
> {code}
> hive> explain select distinct L_SHIPDATE from lineitem;
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: lineitem
>                   Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: l_shipdate (type: string)
>                     outputColumnNames: l_shipdate
>                     Statistics: Num rows: 5999989709 Data size: 4745677733354 Basic stats: COMPLETE Column stats: COMPLETE
>                     Group By Operator
>                       keys: l_shipdate (type: string)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: string)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: string)
>                         Statistics: Num rows: 5999989709 Data size: 563999032646 Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Reducer 2 
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: string)
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1955 Data size: 183770 Basic stats: COMPLETE Column stats: COMPLETE
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)