Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Tue, 6 Oct 2015 13:55:26 +0000 (UTC)
From: "Hive QA (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.12775170.1423956696000.35971.1444139726871@Atlassian.JIRA>
In-Reply-To: <JIRA.12775170.1423956696000@Atlassian.JIRA>
References: <JIRA.12775170.1423956696000@Atlassian.JIRA>
 <JIRA.12775170.1423956696401@arcas>
Subject: [jira] [Commented] (HIVE-9695) Redundant filter operator in reducer
 Vertex when CBO is disabled
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945051#comment-14945051 ] 

Hive QA commented on HIVE-9695:
-------------------------------


{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765164/HIVE-9695.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9639 tests executed
*Failed tests:*
{noformat}
TestCliDriver-udf_bitmap_empty.q-multigroupby_singlemr.q-quotedid_basic.q-and-12-more - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5546/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5546/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5546/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765164 - PreCommit-HIVE-TRUNK-Build

> Redundant filter operator in reducer Vertex when CBO is disabled
> ----------------------------------------------------------------
>
>                 Key: HIVE-9695
>                 URL: https://issues.apache.org/jira/browse/HIVE-9695
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>    Affects Versions: 2.0.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-9695.01.patch, HIVE-9695.patch
>
>
> There is a redundant filter operator in reducer Vertex when CBO is disabled.
> Query 
> {code}
> select 
>         ss_item_sk, ss_ticket_number, ss_store_sk
>     from
>         store_sales a, store_returns b, store
>     where
>         a.ss_item_sk = b.sr_item_sk
>             and a.ss_ticket_number = b.sr_ticket_number 
>             and ss_sold_date_sk between 2450816 and 2451500
> 			and sr_returned_date_sk between 2450816 and 2451500
> 			and s_store_sk = ss_store_sk;
> {code}
> Plan snippet 
> {code}
>   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
> {code}
> Full plan with CBO disabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
>                   Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
>                     Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
>                       Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: sr_returned_date_sk (type: int)
>             Execution mode: vectorized
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: store
>                   filterExpr: s_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: s_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: s_store_sk (type: int)
>                       sort order: +
>                       Map-reduce partition columns: s_store_sk (type: int)
>                       Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
>                   Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
>                     Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
>                       sort order: ++
>                       Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
>                       Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
>                       value expressions: ss_store_sk (type: int), ss_sold_date_sk (type: int)
>             Execution mode: vectorized
>         Reducer 2
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20}
>                   1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17}
>                 outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45
>                 Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE Column stats: COMPLETE
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                   condition expressions:
>                     0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45}
>                     1 {s_store_sk}
>                   keys:
>                     0 _col6 (type: int)
>                     1 s_store_sk (type: int)
>                   outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45, _col49
>                   input vertices:
>                     1 Map 3
>                   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
>                     Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
>                       File Output Operator
>                         compressed: false
>                         Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
>                         table:
>                             input format: org.apache.hadoop.mapred.TextInputFormat
>                             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Full plan with CBO enabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 4 <- Map 1 (BROADCAST_EDGE)
>         Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store
>                   filterExpr: s_store_sk is not null (type: boolean)
>                   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: s_store_sk is not null (type: boolean)
>                     Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: s_store_sk (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: int)
>                         Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 2
>             Map Operator Tree:
>                 TableScan
>                   alias: b
>                   filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
>                   Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
>                     Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: int), _col1 (type: int)
>                         sort order: ++
>                         Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
>                         Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: a
>                   filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
>                   Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
>                     Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number (type: int)
>                       outputColumnNames: _col0, _col1, _col2
>                       Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col0} {_col1} {_col2}
>                           1
>                         keys:
>                           0 _col1 (type: int)
>                           1 _col0 (type: int)
>                         outputColumnNames: _col0, _col1, _col2
>                         input vertices:
>                           1 Map 1
>                         Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: int), _col2 (type: int)
>                           sort order: ++
>                           Map-reduce partition columns: _col0 (type: int), _col2 (type: int)
>                           Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
>                           value expressions: _col1 (type: int)
>             Execution mode: vectorized
>         Reducer 3
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 condition expressions:
>                   0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1}
>                   1
>                 outputColumnNames: _col0, _col1, _col2
>                 Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int)
>                   outputColumnNames: _col0, _col1, _col2
>                   Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)