hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers
Date Sun, 07 Dec 2014 11:04:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237113#comment-14237113
] 

Hive QA commented on HIVE-9025:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12685602/HIVE-9025.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6696 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12685602 - PreCommit-HIVE-TRUNK-Build

> join38.q (without map join) produces incorrect result when testing with multiple reducers
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-9025
>                 URL: https://issues.apache.org/jira/browse/HIVE-9025
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chao
>            Assignee: Ted Xu
>            Priority: Blocker
>         Attachments: HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set it to be
a larger number (3 for instance), then result will be 
> {noformat}
> val_111	105	1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will overwrite
the partition cols for the reduce sink desc, with an empty list. Then, later on in ReduceSinkOperator#computeHashCode,
since partitionEval is length 0, it will use an random number as hashcode, for each separate
row. As result, rows with same key will be distributed to different reducers, and hence leads
to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message