hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans
Date Thu, 26 Mar 2015 23:26:52 GMT
Mostafa Mokhtar created HIVE-10107:
--------------------------------------

             Summary: Union All : Vertex missing stats resulting in OOM and in-efficient plans
                 Key: HIVE-10107
                 URL: https://issues.apache.org/jira/browse/HIVE-10107
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer
    Affects Versions: 0.14.0
            Reporter: Mostafa Mokhtar
            Assignee: Prasanth Jayachandran
             Fix For: 1.2.0


Reducer Vertices sending data to a Union all edge are missing statistics and as a result we
either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION
ALL.

Query
{code}
select 
    count(*) rowcount
from
    (select 
        ss_item_sk, ss_ticket_number, ss_store_sk
    from
        store_sales a, store_returns b
    where
        a.ss_item_sk = b.sr_item_sk
            and a.ss_ticket_number = b.sr_ticket_number union all select 
        ss_item_sk, ss_ticket_number, ss_store_sk
    from
        store_sales c, store_returns d
    where
        c.ss_item_sk = d.sr_item_sk
            and c.ss_ticket_number = d.sr_ticket_number) t
group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
having rowcount > 100000000;
{code}

Plan snippet 
{code}
 Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
        Reducer 4 <- Union 3 (SIMPLE_EDGE)
        Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)

  Reducer 4
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: (_col3 > 100000000) (type: boolean)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                  Select Operator
                    expressions: _col3 (type: bigint)
                    outputColumnNames: _col0
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats:
COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 7
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ss_item_sk (type: int), ss_ticket_number (type: int)
                  1 sr_item_sk (type: int), sr_ticket_number (type: int)
                outputColumnNames: _col1, _col6, _col8, _col27, _col34
                Filter Operator
                  predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                  Select Operator
                    expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                    outputColumnNames: _col0, _col1, _col2
                    Group By Operator
                      aggregations: count()
                      keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int), _col2 (type:
int)
                        sort order: +++
                        Map-reduce partition columns: _col0 (type: int), _col1 (type: int),
_col2 (type: int)
                        value expressions: _col3 (type: bigint)
{code}

The full explain plan 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
        Reducer 4 <- Union 3 (SIMPLE_EDGE)
        Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
      DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: a
                  filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type:
boolean)
                  Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type:
boolean)
                    Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number
(type: int)
                      Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE
Column stats: COMPLETE
                      value expressions: ss_store_sk (type: int)
        Map 5
            Map Operator Tree:
                TableScan
                  alias: b
                  filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type:
boolean)
                  Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type:
boolean)
                    Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number
(type: int)
                      Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE
Column stats: COMPLETE
        Map 6
            Map Operator Tree:
                TableScan
                  alias: c
                  filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type:
boolean)
                  Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type:
boolean)
                    Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number
(type: int)
                      Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE
Column stats: COMPLETE
                      value expressions: ss_store_sk (type: int)
        Map 8
            Map Operator Tree:
                TableScan
                  alias: d
                  filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type:
boolean)
                  Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type:
boolean)
                    Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE
Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                      sort order: ++
                      Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number
(type: int)
                      Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE
Column stats: COMPLETE
        Reducer 2
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ss_item_sk (type: int), ss_ticket_number (type: int)
                  1 sr_item_sk (type: int), sr_ticket_number (type: int)
                outputColumnNames: _col1, _col6, _col8, _col27, _col34
                Filter Operator
                  predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                  Select Operator
                    expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                    outputColumnNames: _col0, _col1, _col2
                    Group By Operator
                      aggregations: count()
                      keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int), _col2 (type:
int)
                        sort order: +++
                        Map-reduce partition columns: _col0 (type: int), _col1 (type: int),
_col2 (type: int)
                        value expressions: _col3 (type: bigint)
        Reducer 4
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: (_col3 > 100000000) (type: boolean)
                  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                  Select Operator
                    expressions: _col3 (type: bigint)
                    outputColumnNames: _col0
                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats:
COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 7
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ss_item_sk (type: int), ss_ticket_number (type: int)
                  1 sr_item_sk (type: int), sr_ticket_number (type: int)
                outputColumnNames: _col1, _col6, _col8, _col27, _col34
                Filter Operator
                  predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                  Select Operator
                    expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                    outputColumnNames: _col0, _col1, _col2
                    Group By Operator
                      aggregations: count()
                      keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Reduce Output Operator
                        key expressions: _col0 (type: int), _col1 (type: int), _col2 (type:
int)
                        sort order: +++
                        Map-reduce partition columns: _col0 (type: int), _col1 (type: int),
_col2 (type: int)
                        value expressions: _col3 (type: bigint)
        Union 3
            Vertex: Union 3

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

Also TPC-DS Q54 fails with OOM, this failure happens when we chose a different plan.
The OOM happens in  vertexName=Map 14
{code}
explain  
with my_customers as (
 select  c_customer_sk
        , c_current_addr_sk
 from   
        ( select cs_sold_date_sk sold_date_sk,
                 cs_bill_customer_sk customer_sk,
                 cs_item_sk item_sk
          from   catalog_sales
          union all
          select ws_sold_date_sk sold_date_sk,
                 ws_bill_customer_sk customer_sk,
                 ws_item_sk item_sk
          from   web_sales
         ) cs_or_ws_sales,
         item,
         date_dim,
         customer
 where   sold_date_sk = d_date_sk
         and item_sk = i_item_sk
         and i_category = 'Jewelry'
         and i_class = 'football'
         and c_customer_sk = cs_or_ws_sales.customer_sk
         and d_moy = 3
         and d_year = 2000
         group by  c_customer_sk
        , c_current_addr_sk
 )
 , my_revenue as (
 select c_customer_sk,
        sum(ss_ext_sales_price) as revenue
 from   my_customers,
        store_sales,
        customer_address,
        store,
        date_dim
 where  c_current_addr_sk = ca_address_sk
        and ca_county = s_county
        and ca_state = s_state
        and ss_sold_date_sk = d_date_sk
        and c_customer_sk = ss_customer_sk
        and d_month_seq between (1203)
                           and  (1205)
 group by c_customer_sk
 )
 , segments as
 (select cast((revenue/50) as int) as segment
  from   my_revenue
 )
  select  segment, count(*) as num_customers, segment*50 as segment_base
 from segments
 group by segment
 order by segment, num_customers
 limit 100
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
        Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
        Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
        Map 14 <- Union 11 (BROADCAST_EDGE)
        Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
        Map 8 <- Map 14 (BROADCAST_EDGE)
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
        Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
        Reducer 9 <- Map 8 (SIMPLE_EDGE)
      DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: store_sales
                  filterExpr: ss_customer_sk is not null (type: boolean)
                  Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats:
COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ss_customer_sk is not null (type: boolean)
                    Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats:
COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: ss_customer_sk (type: int), ss_ext_sales_price (type: float),
ss_sold_date_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats:
COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col2 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col0, _col1
                        input vertices:
                          1 Map 5
                        Statistics: Num rows: 90081226648 Data size: 720649813184 Basic stats:
COMPLETE Column stats: COMPLETE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col0 (type: int)
                            1 _col5 (type: int)
                          outputColumnNames: _col1, _col10
                          input vertices:
                            1 Map 6
                          Statistics: Num rows: 99089351460 Data size: 792714811684 Basic
stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col10 (type: int), _col1 (type: float)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 99089351460 Data size: 792714811684 Basic
stats: COMPLETE Column stats: NONE
                            Group By Operator
                              aggregations: sum(_col1)
                              keys: _col0 (type: int)
                              mode: hash
                              outputColumnNames: _col0, _col1
                              Statistics: Num rows: 99089351460 Data size: 792714811684 Basic
stats: COMPLETE Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col0 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: int)
                                Statistics: Num rows: 99089351460 Data size: 792714811684
Basic stats: COMPLETE Column stats: NONE
                                value expressions: _col1 (type: double)
            Execution mode: vectorized
        Map 10 
            Map Operator Tree:
                TableScan
                  alias: catalog_sales
                  filterExpr: (cs_item_sk is not null and cs_bill_customer_sk is not null)
(type: boolean)
                  Filter Operator
                    predicate: (cs_item_sk is not null and cs_bill_customer_sk is not null)
(type: boolean)
                    Select Operator
                      expressions: cs_sold_date_sk (type: int), cs_bill_customer_sk (type:
int), cs_item_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1, _col2
                        input vertices:
                          1 Map 13
                        Reduce Output Operator
                          key expressions: _col2 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col2 (type: int)
                          value expressions: _col1 (type: int)
            Execution mode: vectorized
        Map 12 
            Map Operator Tree:
                TableScan
                  alias: web_sales
                  filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null)
(type: boolean)
                  Filter Operator
                    predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null)
(type: boolean)
                    Select Operator
                      expressions: ws_sold_date_sk (type: int), ws_bill_customer_sk (type:
int), ws_item_sk (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1, _col2
                        input vertices:
                          1 Map 13
                        Reduce Output Operator
                          key expressions: _col2 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col2 (type: int)
                          value expressions: _col1 (type: int)
            Execution mode: vectorized
        Map 13 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null)
(type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null)
(type: boolean)
                    Statistics: Num rows: 624 Data size: 7488 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column
stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column
stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column
stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE
Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: catalog_sales
                            Partition key expr: cs_sold_date_sk
                            Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: cs_sold_date_sk
                            Target Vertex: Map 10
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column
stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE
Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: web_sales
                            Partition key expr: ws_sold_date_sk
                            Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: ws_sold_date_sk
                            Target Vertex: Map 12
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column
stats: COMPLETE
            Execution mode: vectorized
        Map 14 
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk
is not null) (type: boolean)
                  Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and
i_item_sk is not null) (type: boolean)
                    Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column
stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col2 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1
                        input vertices:
                          0 Union 11
                        Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col1 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col1 (type: int)
                          Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
            Execution mode: vectorized
        Map 5 
            Map Operator Tree:
                TableScan
                  alias: date_dim
                  filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null)
(type: boolean)
                  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null)
(type: boolean)
                    Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: d_date_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE
Column stats: COMPLETE
                        Group By Operator
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                          Dynamic Partitioning Event Operator
                            Target Input: store_sales
                            Partition key expr: ss_sold_date_sk
                            Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE
Column stats: COMPLETE
                            Target column: ss_sold_date_sk
                            Target Vertex: Map 1
            Execution mode: vectorized
        Map 6 
            Map Operator Tree:
                TableScan
                  alias: customer_address
                  filterExpr: ((ca_county is not null and ca_state is not null) and ca_address_sk
is not null) (type: boolean)
                  Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: ((ca_county is not null and ca_state is not null) and ca_address_sk
is not null) (type: boolean)
                    Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: ca_address_sk (type: int), ca_county (type: string), ca_state
(type: string)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE
Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col1 (type: string), _col2 (type: string)
                          1 _col0 (type: string), _col1 (type: string)
                        outputColumnNames: _col0
                        input vertices:
                          1 Map 7
                        Statistics: Num rows: 778829 Data size: 3115316 Basic stats: COMPLETE
Column stats: COMPLETE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          keys:
                            0 _col0 (type: int)
                            1 _col1 (type: int)
                          outputColumnNames: _col5
                          input vertices:
                            1 Reducer 9
                          Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col5 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col5 (type: int)
                            Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
            Execution mode: vectorized
        Map 7 
            Map Operator Tree:
                TableScan
                  alias: store
                  filterExpr: (s_county is not null and s_state is not null) (type: boolean)
                  Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column
stats: COMPLETE
                  Filter Operator
                    predicate: (s_county is not null and s_state is not null) (type: boolean)
                    Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: s_county (type: string), s_state (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column
stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: string), _col1 (type: string)
                        sort order: ++
                        Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
                        Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE
Column stats: COMPLETE
            Execution mode: vectorized
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: customer
                  filterExpr: (c_customer_sk is not null and c_current_addr_sk is not null)
(type: boolean)
                  Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (c_customer_sk is not null and c_current_addr_sk is not null)
(type: boolean)
                    Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: c_customer_sk (type: int), c_current_addr_sk (type: int)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE
Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col0 (type: int)
                          1 _col1 (type: int)
                        outputColumnNames: _col0, _col1
                        input vertices:
                          1 Map 14
                        Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: int), _col1 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int), _col1 (type: int)
                            sort order: ++
                            Map-reduce partition columns: _col0 (type: int), _col1 (type:
int)
                            Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
            Execution mode: vectorized
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                aggregations: sum(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE
Column stats: NONE
                Select Operator
                  expressions: UDFToInteger((_col1 / 50.0)) (type: int)
                  outputColumnNames: _col0
                  Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE
Column stats: NONE
                  Group By Operator
                    aggregations: count()
                    keys: _col0 (type: int)
                    mode: hash
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats:
COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats:
COMPLETE Column stats: NONE
                      value expressions: _col1 (type: bigint)
        Reducer 3 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE
Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: bigint), (_col0 * 50) (type:
int)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE
Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: int), _col1 (type: bigint)
                    sort order: ++
                    Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats:
COMPLETE Column stats: NONE
                    TopN Hash Memory Usage: 0.04
                    value expressions: _col2 (type: int)
        Reducer 4 
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: bigint),
VALUE._col0 (type: int)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE
Column stats: NONE
                Limit
                  Number of rows: 100
                  Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats:
NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column
stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 9 
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: int), KEY._col1 (type: int)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column
stats: NONE
                Reduce Output Operator
                  key expressions: _col1 (type: int)
                  sort order: +
                  Map-reduce partition columns: _col1 (type: int)
                  Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column
stats: NONE
                  value expressions: _col0 (type: int)
        Union 11 
            Vertex: Union 11

  Stage: Stage-0
    Fetch Operator
      limit: 100
      Processor Tree:
        ListSink
{code}


In Map 14 Data size is 0 
{code}
p 14 
            Map Operator Tree:
                TableScan
                  alias: item
                  filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk
is not null) (type: boolean)
                  Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator
                    predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and
i_item_sk is not null) (type: boolean)
                    Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column
stats: COMPLETE
                    Select Operator
                      expressions: i_item_sk (type: int)
                      outputColumnNames: _col0
                      Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column
stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        keys:
                          0 _col2 (type: int)
                          1 _col0 (type: int)
                        outputColumnNames: _col1
                        input vertices:
                          0 Union 11
                        Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col1 (type: int)
                          sort order: +
                          Map-reduce partition columns: _col1 (type: int)
                          Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL
Column stats: NONE
            Execution mode: vectorized
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message