hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
Date Thu, 05 Jun 2014 07:47:03 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018563#comment-14018563
] 

Hive QA commented on HIVE-4867:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648427/HIVE-4867.5.patch.txt

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 5510 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_rearrange
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_reorder4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_join_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_exclude_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_sortmerge_mapjoin_mismatch_1
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-390/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 43 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648427

> Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-4867
>                 URL: https://issues.apache.org/jira/browse/HIVE-4867
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Yin Huai
>            Assignee: Navis
>         Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, HIVE-4867.3.patch.txt,
HIVE-4867.4.patch.txt, HIVE-4867.5.patch.txt, source_only.txt
>
>
> A ReduceSinkOperator emits data in the format of keys and values. Right now, a column
may appear in both the key list and value list, which result in unnecessary overhead for shuffling.

> Example:
> We have a query shown below ...
> {code:sql}
> explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
> {\code}
> The plan is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         store_sales 
>           TableScan
>             alias: store_sales
>             Select Operator
>               expressions:
>                     expr: ss_ticket_number
>                     type: int
>               outputColumnNames: _col0
>               Reduce Output Operator
>                 key expressions:
>                       expr: _col0
>                       type: int
>                 sort order: +
>                 Map-reduce partition columns:
>                       expr: _col0
>                       type: int
>                 tag: -1
>                 value expressions:
>                       expr: _col0
>                       type: int
>       Reduce Operator Tree:
>         Extract
>           File Output Operator
>             compressed: false
>             GlobalTableId: 0
>             table:
>                 input format: org.apache.hadoop.mapred.TextInputFormat
>                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {\code}
> The column 'ss_ticket_number' is in both the key list and value list of the ReduceSinkOperator.
The type of ss_ticket_number is int. For this case, BinarySortableSerDe will introduce 1 byte
more for every int in the key. LazyBinarySerDe will also introduce overhead when recording
the length of a int. For every int, 10 bytes should be a rough estimation of the size of data
emitted from the Map phase. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message