hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite
Date Wed, 14 Jan 2015 00:41:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mostafa Mokhtar updated HIVE-9368:
----------------------------------
    Attachment: explain_fetch_column_stats_on.txt
                explain_fetch_column_stats_off.txt

> Physical optimizer : Join order in Explain is different from join order provided by Calcite
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9368
>                 URL: https://issues.apache.org/jira/browse/HIVE-9368
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Vikram Dixit K
>             Fix For: 0.15.0
>
>         Attachments: explain_fetch_column_stats_off.txt, explain_fetch_column_stats_on.txt
>
>
> Join order in explain is different from that provided by Calcite, this was observed during
the Fidelity POC. 
> Logical plan from Calcite :
> {code}
> 2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner (CalcitePlanner.java:apply(743))
- Plan After Join Reordering:
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], cost_center_lvl1_id=[$3],
cost_pool_lvl6_id=[$4], lvl5_id=[$5], view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8],
client_id=[$9], lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], account_id=[$14],
mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative cost = {3.057177094767754E9 rows,
0.0 cpu, 0.0 io}, id = 636
>   HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], agg#0=[SUM($15)]):
rowcount = 2.53152774E8, cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id
= 634
>     HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], $f6=[$6],
$f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], $f13=[$18], $f14=[$19],
$f15=[*($13, $20)]): rowcount = 3.401053197411791E11, cumulative cost = {3.057177094767754E9
rows, 0.0 cpu, 0.0 io}, id = 632
>       HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], cost_center_lvl1_id=[$10],
cost_pool_lvl6_id=[$11], activity_id=[$12], view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15],
client_id=[$16], lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], time_key_num0=[$0],
activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6],
lvl5_id=[$25], current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], current_ind0=[$23],
activity_id1=[$24]): rowcount = 3.401053197411791E11, cumulative cost = {3.057177094767754E9
rows, 0.0 cpu, 0.0 io}, id = 692
>         HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), =($12, $1))],
joinType=[inner]): rowcount = 3.401053197411791E11, cumulative cost = {3.057177094767754E9
rows, 0.0 cpu, 0.0 io}, id = 690
>           HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], fprs_id=[$3],
ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 2.926396239E9, cumulative cost
= {0.0 rows, 0.0 cpu, 0.0 io}, id = 590
>             HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]):
rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
>           HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 1.0871372980143067E8,
cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 0.0 io}, id = 688
>             HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 1.4392118216323378E7,
cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, id = 653
>               HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], cost_center_lvl1_id=[$3],
cost_pool_lvl6_id=[$4], activity_id=[$5], view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8],
client_id=[$9], lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount
= 6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587
>                 HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount
= 6870067.0, cumulative cost = {0}, id = 464
>               HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], activity_id=[$60]):
rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 628
>                 HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): rowcount
= 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
>                   HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): rowcount =
40683.0, cumulative cost = {0}, id = 467
>             HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): rowcount =
794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
>               HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, cumulative cost
= {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
>                 HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]): rowcount = 1589540.0,
cumulative cost = {0}, id = 466
> {code}
> Plan #1 with Fetch column stats on 
> {code}
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 4 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150113185454_d7ce6ecf-2d50-45ed-8a88-6283bb091b0e:3
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: driver
>                   filterExpr: (((time_key_num is not null and plan_id is not null) and
fprs_id is not null) and activity_id is not null) (type: boolean)
>                   Statistics: Num rows: 2926396239 Data size: 468223398240 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((time_key_num is not null and plan_id is not null) and
fprs_id is not null) and activity_id is not null) (type: boolean)
>                     Statistics: Num rows: 2926396239 Data size: 468223398240 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: time_key_num (type: bigint), activity_id (type: bigint),
plan_id (type: bigint), fprs_id (type: bigint), ssn_id (type: bigint), account_id (type: bigint),
driver_pct (type: decimal(28,12))
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
>                       Statistics: Num rows: 2926396239 Data size: 468223398240 Basic
stats: COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: bigint), _col2 (type: bigint),
_col3 (type: bigint), _col1 (type: bigint)
>                         sort order: ++++
>                         Map-reduce partition columns: _col0 (type: bigint), _col2 (type:
bigint), _col3 (type: bigint), _col1 (type: bigint)
>                         Statistics: Num rows: 2926396239 Data size: 468223398240 Basic
stats: COMPLETE Column stats: COMPLETE
>                         value expressions: _col4 (type: bigint), _col5 (type: bigint),
_col6 (type: decimal(28,12))
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: balance
>                   filterExpr: (((activity_id is not null and fprs_id is not null) and
time_key_num is not null) and plan_id is not null) (type: boolean)
>                   Statistics: Num rows: 6870067 Data size: 2102240502 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((activity_id is not null and fprs_id is not null) and
time_key_num is not null) and plan_id is not null) (type: boolean)
>                     Statistics: Num rows: 6870067 Data size: 1483934472 Basic stats:
COMPLETE Column stats: COMPLETE
>                     Select Operator
>                       expressions: scale (type: bigint), time_key_num (type: bigint),
dataset_code (type: bigint), cost_center_lvl1_id (type: bigint), cost_pool_lvl6_id (type:
bigint), activity_id (type: bigint), view_lvl1_id (type: bigint), from_lvl1_id (type: bigint),
plan_id (type: bigint), client_id (type: bigint), lob_id (type: bigint), product_id (type:
bigint), fprs_id (type: bigint), mtd_balance (type: decimal(28,12))
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13
>                       Statistics: Num rows: 6870067 Data size: 1483934472 Basic stats:
COMPLETE Column stats: COMPLETE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col5 (type: bigint)
>                           1 _col3 (type: bigint)
>                         outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
>                         input vertices:
>                           1 Map 5
>                         Statistics: Num rows: 57555202 Data size: 17151450196 Basic stats:
COMPLETE Column stats: COMPLETE
>                         Map Join Operator
>                           condition map:
>                                Inner Join 0 to 1
>                           keys:
>                             0 _col12 (type: bigint)
>                             1 _col2 (type: bigint)
>                           outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col18
>                           input vertices:
>                             1 Map 6
>                           Statistics: Num rows: 869509350 Data size: 266069861100 Basic
stats: COMPLETE Column stats: COMPLETE
>                           Select Operator
>                             expressions: _col0 (type: bigint), _col1 (type: bigint),
_col10 (type: bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: decimal(28,12)),
_col14 (type: string), _col18 (type: bigint), _col2 (type: bigint), _col3 (type: bigint),
_col4 (type: bigint), _col5 (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col8
(type: bigint), _col9 (type: bigint)
>                             outputColumnNames: _col0, _col1, _col10, _col11, _col12,
_col13, _col14, _col18, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
>                             Statistics: Num rows: 869509350 Data size: 266069861100 Basic
stats: COMPLETE Column stats: COMPLETE
>                             Reduce Output Operator
>                               key expressions: _col1 (type: bigint), _col8 (type: bigint),
_col12 (type: bigint), _col5 (type: bigint)
>                               sort order: ++++
>                               Map-reduce partition columns: _col1 (type: bigint), _col8
(type: bigint), _col12 (type: bigint), _col5 (type: bigint)
>                               Statistics: Num rows: 869509350 Data size: 266069861100
Basic stats: COMPLETE Column stats: COMPLETE
>                               value expressions: _col0 (type: bigint), _col2 (type: bigint),
_col3 (type: bigint), _col4 (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col9
(type: bigint), _col10 (type: bigint), _col11 (type: bigint), _col13 (type: decimal(28,12)),
_col14 (type: string), _col18 (type: bigint)
>             Execution mode: vectorized
>         Map 5
>             Map Operator Tree:
>                 TableScan
>                   alias: act
>                   filterExpr: (((current_ind = 'Y') and (rollup_key = 'TOTACT')) and
activity_id is not null) (type: boolean)
>                   Statistics: Num rows: 40683 Data size: 271025472 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: (((current_ind = 'Y') and (rollup_key = 'TOTACT')) and
activity_id is not null) (type: boolean)
>                     Statistics: Num rows: 20341 Data size: 5553093 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: lvl5_id (type: string), activity_id (type: bigint)
>                       outputColumnNames: _col0, _col3
>                       Statistics: Num rows: 20341 Data size: 1993418 Basic stats: COMPLETE
Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col3 (type: bigint)
>                         sort order: +
>                         Map-reduce partition columns: _col3 (type: bigint)
>                         Statistics: Num rows: 20341 Data size: 1993418 Basic stats: COMPLETE
Column stats: COMPLETE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Map 6
>             Map Operator Tree:
>                 TableScan
>                   alias: fprs
>                   filterExpr: ((current_ind = 'Y') and fprs_id is not null) (type: boolean)
>                   Statistics: Num rows: 1589540 Data size: 6862044180 Basic stats: COMPLETE
Column stats: COMPLETE
>                   Filter Operator
>                     predicate: ((current_ind = 'Y') and fprs_id is not null) (type: boolean)
>                     Statistics: Num rows: 794770 Data size: 80271770 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: lvl5_id (type: bigint), fprs_id (type: bigint)
>                       outputColumnNames: _col0, _col2
>                       Statistics: Num rows: 794770 Data size: 12716320 Basic stats: COMPLETE
Column stats: COMPLETE
>                       Reduce Output Operator
>                         key expressions: _col2 (type: bigint)
>                         sort order: +
>                         Map-reduce partition columns: _col2 (type: bigint)
>                         Statistics: Num rows: 794770 Data size: 12716320 Basic stats:
COMPLETE Column stats: COMPLETE
>                         value expressions: _col0 (type: bigint)
>             Execution mode: vectorized
>         Reducer 2
>             Reduce Operator Tree:
>               Merge Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 _col0 (type: bigint), _col2 (type: bigint), _col3 (type: bigint),
_col1 (type: bigint)
>                   1 _col1 (type: bigint), _col8 (type: bigint), _col12 (type: bigint),
_col5 (type: bigint)
>                 outputColumnNames: _col4, _col5, _col6, _col7, _col8, _col9, _col10,
_col11, _col13, _col14, _col15, _col16, _col17, _col18, _col20, _col21, _col25
>                 Statistics: Num rows: 28151027141 Data size: 8389006088018 Basic stats:
COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col7 (type: bigint), _col8 (type: bigint), _col9 (type:
bigint), _col10 (type: bigint), _col11 (type: bigint), _col21 (type: string), _col13 (type:
bigint), _col14 (type: bigint), _col15 (type: bigint), _col16 (type: bigint), _col17 (type:
bigint), _col18 (type: bigint), _col25 (type: bigint), _col4 (type: bigint), _col5 (type:
bigint), (_col20 * _col6) (type: decimal(38,24))
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                   Statistics: Num rows: 28151027141 Data size: 8389006088018 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Group By Operator
>                     aggregations: sum(_col15)
>                     keys: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint),
_col3 (type: bigint), _col4 (type: bigint), _col5 (type: string), _col6 (type: bigint), _col7
(type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
(type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
>                     mode: hash
>                     outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                     Statistics: Num rows: 28151027141 Data size: 8389006088018 Basic
stats: COMPLETE Column stats: COMPLETE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: bigint), _col1 (type: bigint), _col2
(type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: string), _col6 (type:
bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint),
_col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
>                       sort order: +++++++++++++++
>                       Map-reduce partition columns: _col0 (type: bigint), _col1 (type:
bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: string),
_col6 (type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10
(type: bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14
(type: bigint)
>                       Statistics: Num rows: 28151027141 Data size: 8389006088018 Basic
stats: COMPLETE Column stats: COMPLETE
>                       value expressions: _col15 (type: decimal(38,24))
>         Reducer 3
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 keys: KEY._col0 (type: bigint), KEY._col1 (type: bigint), KEY._col2 (type:
bigint), KEY._col3 (type: bigint), KEY._col4 (type: bigint), KEY._col5 (type: string), KEY._col6
(type: bigint), KEY._col7 (type: bigint), KEY._col8 (type: bigint), KEY._col9 (type: bigint),
KEY._col10 (type: bigint), KEY._col11 (type: bigint), KEY._col12 (type: bigint), KEY._col13
(type: bigint), KEY._col14 (type: bigint)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7,
_col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                 Statistics: Num rows: 28151027141 Data size: 8389006088018 Basic stats:
COMPLETE Column stats: COMPLETE
>                 Select Operator
>                   expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type:
bigint), _col3 (type: bigint), _col4 (type: bigint), UDFToLong(_col5) (type: bigint), _col6
(type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type:
bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), UDFToString(_col14)
(type: string), CAST( _col15 AS decimal(28,12)) (type: decimal(28,12))
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                   Statistics: Num rows: 28151027141 Data size: 11260410856400 Basic stats:
COMPLETE Column stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 28151027141 Data size: 11260410856400 Basic
stats: COMPLETE Column stats: COMPLETE
>                     table:
>                         input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                         serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                         name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           replace: false
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
>   Stage: Stage-3
>     Stats-Aggr Operator
> {code}
> Plan #2 with fetch column stats off 
> {code}STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 5 (BROADCAST_EDGE)
>         Map 3 <- Map 4 (BROADCAST_EDGE)
>         Map 5 <- Map 3 (BROADCAST_EDGE)
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName: mmokhtar_20150113185454_7c350b7b-53fa-4bbb-a747-24ec0c2a6ab7:4
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: driver
>                   filterExpr: (((time_key_num is not null and plan_id is not null) and
fprs_id is not null) and activity_id is not null) (type: boolean)
>                   Statistics: Num rows: 2926396239 Data size: 468223398240 Basic stats:
COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: (((time_key_num is not null and plan_id is not null) and
fprs_id is not null) and activity_id is not null) (type: boolean)
>                     Statistics: Num rows: 182899765 Data size: 29263962400 Basic stats:
COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: time_key_num (type: bigint), activity_id (type: bigint),
plan_id (type: bigint), fprs_id (type: bigint), ssn_id (type: bigint), account_id (type: bigint),
driver_pct (type: decimal(28,12))
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
>                       Statistics: Num rows: 182899765 Data size: 29263962400 Basic stats:
COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col0 (type: bigint), _col2 (type: bigint), _col3 (type:
bigint), _col1 (type: bigint)
>                           1 _col1 (type: bigint), _col8 (type: bigint), _col12 (type:
bigint), _col5 (type: bigint)
>                         outputColumnNames: _col4, _col5, _col6, _col7, _col8, _col9,
_col10, _col11, _col13, _col14, _col15, _col16, _col17, _col18, _col20, _col21, _col25
>                         input vertices:
>                           1 Map 5
>                         Statistics: Num rows: 201189745 Data size: 32190359337 Basic
stats: COMPLETE Column stats: NONE
>                         Select Operator
>                           expressions: _col7 (type: bigint), _col8 (type: bigint), _col9
(type: bigint), _col10 (type: bigint), _col11 (type: bigint), _col21 (type: string), _col13
(type: bigint), _col14 (type: bigint), _col15 (type: bigint), _col16 (type: bigint), _col17
(type: bigint), _col18 (type: bigint), _col25 (type: bigint), _col4 (type: bigint), _col5
(type: bigint), (_col20 * _col6) (type: decimal(38,24))
>                           outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                           Statistics: Num rows: 201189745 Data size: 32190359337 Basic
stats: COMPLETE Column stats: NONE
>                           Group By Operator
>                             aggregations: sum(_col15)
>                             keys: _col0 (type: bigint), _col1 (type: bigint), _col2 (type:
bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: string), _col6 (type: bigint),
_col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type: bigint), _col11
(type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type: bigint)
>                             mode: hash
>                             outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                             Statistics: Num rows: 201189745 Data size: 32190359337 Basic
stats: COMPLETE Column stats: NONE
>                             Reduce Output Operator
>                               key expressions: _col0 (type: bigint), _col1 (type: bigint),
_col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: string), _col6
(type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type:
bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), _col14 (type:
bigint)
>                               sort order: +++++++++++++++
>                               Map-reduce partition columns: _col0 (type: bigint), _col1
(type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type:
string), _col6 (type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint),
_col10 (type: bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint),
_col14 (type: bigint)
>                               Statistics: Num rows: 201189745 Data size: 32190359337
Basic stats: COMPLETE Column stats: NONE
>                               value expressions: _col15 (type: decimal(38,24))
>             Execution mode: vectorized
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: balance
>                   filterExpr: (((activity_id is not null and fprs_id is not null) and
time_key_num is not null) and plan_id is not null) (type: boolean)
>                   Statistics: Num rows: 6870067 Data size: 2102240502 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: (((activity_id is not null and fprs_id is not null) and
time_key_num is not null) and plan_id is not null) (type: boolean)
>                     Statistics: Num rows: 429380 Data size: 131390279 Basic stats: COMPLETE
Column stats: NONE
>                     Select Operator
>                       expressions: scale (type: bigint), time_key_num (type: bigint),
dataset_code (type: bigint), cost_center_lvl1_id (type: bigint), cost_pool_lvl6_id (type:
bigint), activity_id (type: bigint), view_lvl1_id (type: bigint), from_lvl1_id (type: bigint),
plan_id (type: bigint), client_id (type: bigint), lob_id (type: bigint), product_id (type:
bigint), fprs_id (type: bigint), mtd_balance (type: decimal(28,12))
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13
>                       Statistics: Num rows: 429380 Data size: 131390279 Basic stats:
COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col5 (type: bigint)
>                           1 _col3 (type: bigint)
>                         outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14
>                         input vertices:
>                           1 Map 4
>                         Statistics: Num rows: 472318 Data size: 144529310 Basic stats:
COMPLETE Column stats: NONE
>                         Reduce Output Operator
>                           key expressions: _col12 (type: bigint)
>                           sort order: +
>                           Map-reduce partition columns: _col12 (type: bigint)
>                           Statistics: Num rows: 472318 Data size: 144529310 Basic stats:
COMPLETE Column stats: NONE
>                           value expressions: _col0 (type: bigint), _col1 (type: bigint),
_col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), _col6
(type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type:
bigint), _col11 (type: bigint), _col13 (type: decimal(28,12)), _col14 (type: string)
>             Execution mode: vectorized
>         Map 4
>             Map Operator Tree:
>                 TableScan
>                   alias: act
>                   filterExpr: (((current_ind = 'Y') and (rollup_key = 'TOTACT')) and
activity_id is not null) (type: boolean)
>                   Statistics: Num rows: 40683 Data size: 271025472 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: (((current_ind = 'Y') and (rollup_key = 'TOTACT')) and
activity_id is not null) (type: boolean)
>                     Statistics: Num rows: 5085 Data size: 33875685 Basic stats: COMPLETE
Column stats: NONE
>                     Select Operator
>                       expressions: lvl5_id (type: string), activity_id (type: bigint)
>                       outputColumnNames: _col0, _col3
>                       Statistics: Num rows: 5085 Data size: 33875685 Basic stats: COMPLETE
Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col3 (type: bigint)
>                         sort order: +
>                         Map-reduce partition columns: _col3 (type: bigint)
>                         Statistics: Num rows: 5085 Data size: 33875685 Basic stats: COMPLETE
Column stats: NONE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Map 5
>             Map Operator Tree:
>                 TableScan
>                   alias: fprs
>                   filterExpr: ((current_ind = 'Y') and fprs_id is not null) (type: boolean)
>                   Statistics: Num rows: 1589540 Data size: 6862044180 Basic stats: COMPLETE
Column stats: NONE
>                   Filter Operator
>                     predicate: ((current_ind = 'Y') and fprs_id is not null) (type: boolean)
>                     Statistics: Num rows: 397385 Data size: 1715511045 Basic stats: COMPLETE
Column stats: NONE
>                     Select Operator
>                       expressions: lvl5_id (type: bigint), fprs_id (type: bigint)
>                       outputColumnNames: _col0, _col2
>                       Statistics: Num rows: 397385 Data size: 1715511045 Basic stats:
COMPLETE Column stats: NONE
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         keys:
>                           0 _col12 (type: bigint)
>                           1 _col2 (type: bigint)
>                         outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col18
>                         input vertices:
>                           0 Map 3
>                         Statistics: Num rows: 519549 Data size: 158982244 Basic stats:
COMPLETE Column stats: NONE
>                         Select Operator
>                           expressions: _col0 (type: bigint), _col1 (type: bigint), _col10
(type: bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: decimal(28,12)),
_col14 (type: string), _col18 (type: bigint), _col2 (type: bigint), _col3 (type: bigint),
_col4 (type: bigint), _col5 (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col8
(type: bigint), _col9 (type: bigint)
>                           outputColumnNames: _col0, _col1, _col10, _col11, _col12, _col13,
_col14, _col18, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
>                           Statistics: Num rows: 519549 Data size: 158982244 Basic stats:
COMPLETE Column stats: NONE
>                           Reduce Output Operator
>                             key expressions: _col1 (type: bigint), _col8 (type: bigint),
_col12 (type: bigint), _col5 (type: bigint)
>                             sort order: ++++
>                             Map-reduce partition columns: _col1 (type: bigint), _col8
(type: bigint), _col12 (type: bigint), _col5 (type: bigint)
>                             Statistics: Num rows: 519549 Data size: 158982244 Basic stats:
COMPLETE Column stats: NONE
>                             value expressions: _col0 (type: bigint), _col2 (type: bigint),
_col3 (type: bigint), _col4 (type: bigint), _col6 (type: bigint), _col7 (type: bigint), _col9
(type: bigint), _col10 (type: bigint), _col11 (type: bigint), _col13 (type: decimal(28,12)),
_col14 (type: string), _col18 (type: bigint)
>             Execution mode: vectorized
>         Reducer 2
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 keys: KEY._col0 (type: bigint), KEY._col1 (type: bigint), KEY._col2 (type:
bigint), KEY._col3 (type: bigint), KEY._col4 (type: bigint), KEY._col5 (type: string), KEY._col6
(type: bigint), KEY._col7 (type: bigint), KEY._col8 (type: bigint), KEY._col9 (type: bigint),
KEY._col10 (type: bigint), KEY._col11 (type: bigint), KEY._col12 (type: bigint), KEY._col13
(type: bigint), KEY._col14 (type: bigint)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7,
_col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                 Statistics: Num rows: 100594872 Data size: 16095179588 Basic stats: COMPLETE
Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type:
bigint), _col3 (type: bigint), _col4 (type: bigint), UDFToLong(_col5) (type: bigint), _col6
(type: bigint), _col7 (type: bigint), _col8 (type: bigint), _col9 (type: bigint), _col10 (type:
bigint), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: bigint), UDFToString(_col14)
(type: string), CAST( _col15 AS decimal(28,12)) (type: decimal(28,12))
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15
>                   Statistics: Num rows: 100594872 Data size: 16095179588 Basic stats:
COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 100594872 Data size: 16095179588 Basic stats:
COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                         serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                         name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           replace: false
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: fidelity.fcap_agg_part_exp_hive_decimal_decimal
>   Stage: Stage-3
>     Stats-Aggr Operator
> {code}
> Query used 
> {code}
> explain insert INTO table fidelity.FCAP_AGG_PART_EXP_hive_decimal_decimal SELECT BALANCE.SCALE,
BALANCE.TIME_KEY_NUM, BALANCE.DATASET_CODE, BALANCE.Cost_Center_LVL1_ID, BALANCE.Cost_Pool_LVL6_ID,
act.lvl5_ID, BALANCE.VIEW_LVL1_ID, BALANCE.FROM_LVL1_ID, BALANCE.PLAN_ID, BALANCE.CLIENT_ID,
BALANCE.LOB_ID, BALANCE.PRODUCT_ID, fprs.lvl5_ID fprs_lvl5_id, DRIVER.SSN_ID, DRIVER.ACCOUNT_ID,
SUM ( BALANCE.MTD_BALANCE * DRIVER.DRIVER_PCT) AS MTD_BALANCE FROM fidelity.FCAP_AGG_PROD_EXP_NOFUND_decimal
BALANCE JOIN fidelity.fcap_drivers_part_exp_inter_bucket_256 DRIVER ON BALANCE.TIME_KEY_NUM
= DRIVER.TIME_KEY_NUM AND BALANCE.PLAN_ID = DRIVER.PLAN_ID AND BALANCE.FPRS_ID = DRIVER.FPRS_ID
AND BALANCE.ACTIVITY_ID = DRIVER.ACTIVITY_ID INNER JOIN fidelity.fobi_fprs_dim_mv_orc fprs
ON balance.FPRS_ID = fprs.fprs_id AND fprs.current_ind = 'Y' inner join fidelity.fobi_activity_dim_mv
act ON BALANCE.activity_id = act.activity_id AND act.current_ind = 'Y' AND act.ROLLUP_KEY
= 'TOTACT' GROUP BY BALANCE.SCALE, BALANCE.TIME_KEY_NUM, BALANCE.DATASET_CODE, BALANCE.Cost_center_LVL1_ID,
BALANCE.Cost_Pool_LVL6_ID, act.lvl5_ID, BALANCE.VIEW_LVL1_ID, BALANCE.FROM_LVL1_ID, BALANCE.PLAN_ID,
BALANCE.CLIENT_ID, BALANCE.LOB_ID, BALANCE.PRODUCT_ID, fprs.lvl5_ID, DRIVER.SSN_ID, DRIVER.ACCOUNT_ID;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message