hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Harsch <thar...@yarcdata.com>
Subject Re: hive job stuck at map=100%, reduce 0%
Date Tue, 08 Jul 2014 21:35:35 GMT
hive> explain select  i_item_desc , i_category , i_class , i_current_price , i_item_id ,sum(ws_ext_sales_price)
as itemrevenue, sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by
i_class) as revenueratio
    > from item JOIN web_sales ON (web_sales.ws_item_sk = item.i_item_sk) JOIN date_dim
ON (web_sales.ws_sold_date_sk = date_dim.d_date_sk)
    > where item.i_category in ('Jewelry', 'Sports', 'Books') and date_dim.d_date between
'2001-01-12' and '2001-02-11' and ws_sold_date between '2001-01-12' and '2001-02-11'
    > group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price
    > order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio
    > limit 100;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME item)) (TOK_TABREF (TOK_TABNAME
web_sales)) (= (. (TOK_TABLE_OR_COL web_sales) ws_item_sk) (. (TOK_TABLE_OR_COL item) i_item_sk)))
(TOK_TABREF (TOK_TABNAME date_dim)) (= (. (TOK_TABLE_OR_COL web_sales) ws_sold_date_sk) (.
(TOK_TABLE_OR_COL date_dim) d_date_sk)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE))
(TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL i_item_desc)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_category))
(TOK_SELEXPR (TOK_TABLE_OR_COL i_class)) (TOK_SELEXPR (TOK_TABLE_OR_COL i_current_price))
(TOK_SELEXPR (TOK_TABLE_OR_COL i_item_id)) (TOK_SELEXPR (TOK_FUNCTION sum (TOK_TABLE_OR_COL
ws_ext_sales_price)) itemrevenue) (TOK_SELEXPR (/ (* (TOK_FUNCTION sum (TOK_TABLE_OR_COL ws_ext_sales_price))
100) (TOK_FUNCTION sum (TOK_FUNCTION sum (TOK_TABLE_OR_COL ws_ext_sales_price)) (TOK_WINDOWSPEC
(TOK_PARTITIONINGSPEC (TOK_DISTRIBUTEBY (TOK_TABLE_OR_COL i_class)))))) revenueratio)) (TOK_WHERE
(and (and (TOK_FUNCTION in (. (TOK_TABLE_OR_COL item) i_category) 'Jewelry' 'Sports' 'Books')
(TOK_FUNCTION between KW_FALSE (. (TOK_TABLE_OR_COL date_dim) d_date) '2001-01-12' '2001-02-11'))
(TOK_FUNCTION between KW_FALSE (TOK_TABLE_OR_COL ws_sold_date) '2001-01-12' '2001-02-11')))
(TOK_GROUPBY (TOK_TABLE_OR_COL i_item_id) (TOK_TABLE_OR_COL i_item_desc) (TOK_TABLE_OR_COL
i_category) (TOK_TABLE_OR_COL i_class) (TOK_TABLE_OR_COL i_current_price)) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC
(TOK_TABLE_OR_COL i_category)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_class)) (TOK_TABSORTCOLNAMEASC
(TOK_TABLE_OR_COL i_item_id)) (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL i_item_desc)) (TOK_TABSORTCOLNAMEASC
(TOK_TABLE_OR_COL revenueratio))) (TOK_LIMIT 100)))

STAGE DEPENDENCIES:
  Stage-10 is a root stage
  Stage-3 depends on stages: Stage-10
  Stage-4 depends on stages: Stage-3
  Stage-5 depends on stages: Stage-4
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-10
    Map Reduce Local Work
      Alias -> Map Local Tables:
        date_dim
          Fetch Operator
            limit: -1
        item
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        date_dim
          TableScan
            alias: date_dim
            Filter Operator
              predicate:
                  expr: d_date BETWEEN '2001-01-12' AND '2001-02-11'
                  type: boolean
              HashTable Sink Operator
                condition expressions:
                  0 {_col1} {_col4} {_col5} {_col10} {_col12} {_col47}
                  1
                handleSkewJoin: false
                keys:
                  0 [Column[_col24]]
                  1 [Column[d_date_sk]]
                Position of Big Table: 0
        item
          TableScan
            alias: item
            Filter Operator
              predicate:
                  expr: (i_category) IN ('Jewelry', 'Sports', 'Books')
                  type: boolean
              HashTable Sink Operator
                condition expressions:
                  0 {i_item_id} {i_item_desc} {i_current_price} {i_class} {i_category}
                  1 {ws_sold_date_sk} {ws_ext_sales_price}
                handleSkewJoin: false
                keys:
                  0 [Column[i_item_sk]]
                  1 [Column[ws_item_sk]]
                Position of Big Table: 1

  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        web_sales
          TableScan
            alias: web_sales
            Filter Operator
              predicate:
                  expr: ws_sold_date BETWEEN '2001-01-12' AND '2001-02-11'
                  type: boolean
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {i_item_id} {i_item_desc} {i_current_price} {i_class} {i_category}
                  1 {ws_sold_date_sk} {ws_ext_sales_price}
                handleSkewJoin: false
                keys:
                  0 [Column[i_item_sk]]
                  1 [Column[ws_item_sk]]
                outputColumnNames: _col1, _col4, _col5, _col10, _col12, _col24, _col47
                Position of Big Table: 1
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  condition expressions:
                    0 {_col1} {_col4} {_col5} {_col10} {_col12} {_col47}
                    1
                  handleSkewJoin: false
                  keys:
                    0 [Column[_col24]]
                    1 [Column[d_date_sk]]
                  outputColumnNames: _col1, _col4, _col5, _col10, _col12, _col47
                  Position of Big Table: 0
                  Select Operator
                    expressions:
                          expr: _col1
                          type: string
                          expr: _col4
                          type: string
                          expr: _col12
                          type: string
                          expr: _col10
                          type: string
                          expr: _col5
                          type: float
                          expr: _col47
                          type: float
                    outputColumnNames: _col1, _col4, _col12, _col10, _col5, _col47
                    Group By Operator
                      aggregations:
                            expr: sum(_col47)
                      bucketGroup: false
                      keys:
                            expr: _col1
                            type: string
                            expr: _col4
                            type: string
                            expr: _col12
                            type: string
                            expr: _col10
                            type: string
                            expr: _col5
                            type: float
                      mode: hash
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                      Reduce Output Operator
                        key expressions:
                              expr: _col0
                              type: string
                              expr: _col1
                              type: string
                              expr: _col2
                              type: string
                              expr: _col3
                              type: string
                              expr: _col4
                              type: float
                        sort order: +++++
                        Map-reduce partition columns:
                              expr: _col0
                              type: string
                              expr: _col1
                              type: string
                              expr: _col2
                              type: string
                              expr: _col3
                              type: string
                              expr: _col4
                              type: float
                        tag: -1
                        value expressions:
                              expr: _col5
                              type: double
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE._col0)
          bucketGroup: false
          keys:
                expr: KEY._col0
                type: string
                expr: KEY._col1
                type: string
                expr: KEY._col2
                type: string
                expr: KEY._col3
                type: string
                expr: KEY._col4
                type: float
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
          File Output Operator
            compressed: false
            GlobalTableId: 0
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

  Stage: Stage-4
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://localhost:9000/tmp/hive-tharsch/hive_2014-07-08_16-34-32_453_181611671719021152-1/-mr-10004
            Reduce Output Operator
              key expressions:
                    expr: _col3
                    type: string
                    expr: _col3
                    type: string
              sort order: ++
              Map-reduce partition columns:
                    expr: _col3
                    type: string
              tag: -1
              value expressions:
                    expr: _col0
                    type: string
                    expr: _col1
                    type: string
                    expr: _col2
                    type: string
                    expr: _col3
                    type: string
                    expr: _col4
                    type: float
                    expr: _col5
                    type: double
      Reduce Operator Tree:
        Extract
          PTF Operator
            Select Operator
              expressions:
                    expr: _col1
                    type: string
                    expr: _col2
                    type: string
                    expr: _col3
                    type: string
                    expr: _col4
                    type: float
                    expr: _col0
                    type: string
                    expr: _col5
                    type: double
                    expr: ((_col5 * 100) / _wcol0)
                    type: double
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

  Stage: Stage-5
    Map Reduce
      Alias -> Map Operator Tree:
        hdfs://localhost:9000/tmp/hive-tharsch/hive_2014-07-08_16-34-32_453_181611671719021152-1/-mr-10005
            Reduce Output Operator
              key expressions:
                    expr: _col1
                    type: string
                    expr: _col2
                    type: string
                    expr: _col4
                    type: string
                    expr: _col0
                    type: string
                    expr: _col6
                    type: double
              sort order: +++++
              tag: -1
              value expressions:
                    expr: _col0
                    type: string
                    expr: _col1
                    type: string
                    expr: _col2
                    type: string
                    expr: _col3
                    type: float
                    expr: _col4
                    type: string
                    expr: _col5
                    type: double
                    expr: _col6
                    type: double
      Reduce Operator Tree:
        Extract
          Limit
            File Output Operator
              compressed: false
              GlobalTableId: 0
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
    Fetch Operator
      limit: 100


Time taken: 1.102 seconds, Fetched: 278 row(s)


From: Hari Subramaniyan <hsubramaniyan@hortonworks.com<mailto:hsubramaniyan@hortonworks.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Tuesday, July 8, 2014 2:12 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: hive job stuck at map=100%, reduce 0%

Hi Tim
Can you please provide the "explain plan" for the query. If this happens because of the memory
issues with mapjoin in the query, you can  set hive.auto.convert.join=false and give it a
try.

Thanks
Hari


On Tue, Jul 8, 2014 at 1:18 PM, Tim Harsch <tharsch@yarcdata.com<mailto:tharsch@yarcdata.com>>
wrote:
Hi,
I asked a question on Stack Overflow
(http://stackoverflow.com/questions/24621002/hive-job-stuck-at-map-100-redu
ce-0) which hasn't seemed to get much traction, so I'd like to ask it here
as well.

I'm running hive-0.12.0 on hadoop-2.2.0. After submitting the query:

select  i_item_desc
  ,i_category
  ,i_class
  ,i_current_price
  ,i_item_id
  ,sum(ws_ext_sales_price) as itemrevenue
  ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over
      (partition by i_class) as revenueratio
from item JOIN web_sales ON (web_sales.ws_item_sk = item.i_item_sk) JOIN
date_dim ON (web_sales.ws_sold_date_sk = date_dim.d_date_sk)
where item.i_category in ('Jewelry', 'Sports', 'Books')
    and date_dim.d_date between '2001-01-12' and '2001-02-11'
    and ws_sold_date between '2001-01-12' and '2001-02-11'
group by
    i_item_id
    ,i_item_desc
    ,i_category
    ,i_class
    ,i_current_price
order by
    i_category
    ,i_class
    ,i_item_id
    ,i_item_desc
    ,revenueratio
limit 100;

I get the following errors in the logs:

Hadoop job information for Stage-3: number of mappers: 1; number of
reducers: 1
2014-07-07 15:26:16,893 Stage-3 map = 0%,  reduce = 0%
2014-07-07 15:26:22,033 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU
1.32 sec

And then the last line repeats every second or so ad infinitum. If I look
at container logs I see:

2014-07-07 17:12:17,477 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
report from attempt_1404402886929_0036_m_000000_0: Container killed by the
ApplicationMaster.
Container killed on request. Exit code is 143

I've searched for the Exit code 143, but most the stuff out there refers
to memory issue and I have memory set pretty large (following the advice
of Container is running beyond memory limits
<http://stackoverflow.com/questions/21005643/container-is-running-beyond-me
mory-limits>). I have even tried adding 6GB to each of the settings in
that post, still no luck.

I've also run the job with:
hive -hiveconf hive.root.logger=DEBUG,console
which really just produces alot more info, but nothing I see makes clear
what the issue is.
I'm not sure where else to look...



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

Mime
View raw message