hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Long, Andrew" <loand...@amazon.com>
Subject Cartesian Joins are failing + Controlling the number of mappers in Hive+TEZ
Date Thu, 30 Jun 2016 20:11:53 GMT
Hello everyone,

I’ve run into a situation where tez grossly deoptimizes cartisian mapjoins by not creating
enough map jobs (as seen below).  It seems as the nodes eventually go OOM and go unhealthy.
 Is there a way to force hive to increase the number of map tasks?  I’ve tried a number
of them but they don’t seem to have any effect.

Cheers Andrew



    > SET hive.execution.engine = tez;
hive> SET hive.exec.reducers.bytes.per.reducer=8000000;
hive> SET mapreduce.input.fileinputformat.split.maxsize=8000000;
hive> SET hive.vectorized.execution.enabled=true;
hive> SET hive.stats.join.factor=1.3;
hive> SET hive.exec.reducers.max=4036;
hive> SET  mapred.max.split.size;
mapred.max.split.size=8000000
hive> SET mapreduce.input.fileinputformat.split.maxsize 276480;
mapreduce.input.fileinputformat.split.maxsize 276480 is undefined
hive>
    >
    > CREATE TABLE intial_starting_balance AS
    > SELECT
    > D.caldate as accounting_date_local,
    > RF.account_number as account_number,
    > RF.aggregation_type,
    > RF.application_name,
    > RF.company_code,
    > RF.cost_center,
    > RF.dimension01,
    > RF.dimension02,
    > RF.dimension03,
    > RF.dimension04,
    > RF.dimension05,
    > RF.dimension06,
    > RF.dimension07,
    > RF.dimension08,
    > RF.dimension09,
    > RF.dimension10,
    > RF.dimension11,
    > RF.dimension12,
    > RF.dimension13,
    > RF.dimension14,
    > RF.dimension15,
    > RF.financial_event_type,
    > RF.functional_currency_code,
    > RF.func_currency_amt_sum,
    > RF.func_currency_balance,
    > RF.func_currency_beg_balance,
    > RF.func_journal_balance,
    > RF.func_journal_beg_balance,
    > RF.func_journal_sum,
    > RF.gl_group_id,
    > RF.gl_product_line,
    > RF.jl_description,
    > RF.ledger_id,
    > RF.local_currency_amt_sum,
    > RF.local_currency_balance,
    > RF.local_currency_beg_balance,
    > RF.local_currency_code ,
    > RF.local_journal_balance,
    > RF.local_journal_beg_balance,
    > RF.local_journal_sum,
    > RF.location,
    > RF.ltd_amortization_amount,
    > RF.post_to_gl,
    > RF.principal_amount,
    > RF.project,
    > RF.quantity_sum,
    > RF.rfd_id,
    > RF.sales_channel,
    > RF.sl_db_name,
    > RF.source_system,
    > RF.timezone_id
    > FROM oldest_act_dt_previous_day_rf RF, date_range D;
Warning: Map Join MAPJOIN[7][bigTable=rf] in task 'Map 2' is a cross product
Query ID = hadoop_20160630181616_76699ea6-ed61-4866-b67b-d3f1af345103
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1467163153575_0026)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 KILLED      1          0        0        1       0       1
Map 2                 KILLED    115          0        0      115       0     115
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 5.54 s
--------------------------------------------------------------------------------
Status: Killed
Dag received [DAG_TERMINATE, DAG_KILL] in RUNNING state.
Kill Dag request received from client
Vertex killed, vertexName=Map 1, vertexId=vertex_1467163153575_0026_1_00, diagnostics=[Vertex
received Kill while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, failedTasks:0
killedTasks:1, Vertex vertex_1467163153575_0026_1_00 [Map 1] killed/failed due to:DAG_TERMINATED]
Vertex killed, vertexName=Map 2, vertexId=vertex_1467163153575_0026_1_01, diagnostics=[Vertex
received Kill while in RUNNING state., Vertex did not succeed due to DAG_TERMINATED, failedTasks:0
killedTasks:115, Vertex vertex_1467163153575_0026_1_01 [Map 2] killed/failed due to:DAG_TERMINATED]
DAG did not succeed due to DAG_KILL. failedVertices:0 killedVertices:2
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
hive> set hive.execution.engine=mr;
hive>
    >
    > CREATE TABLE intial_starting_balance AS
    > SELECT
    > D.caldate as accounting_date_local,
    > RF.account_number as account_number,
    > RF.aggregation_type,
    > RF.application_name,
    > RF.company_code,
    > RF.cost_center,
    > RF.dimension01,
    > RF.dimension02,
    > RF.dimension03,
    > RF.dimension04,
    > RF.dimension05,
    > RF.dimension06,
    > RF.dimension07,
    > RF.dimension08,
    > RF.dimension09,
    > RF.dimension10,
    > RF.dimension11,
    > RF.dimension12,
    > RF.dimension13,
    > RF.dimension14,
    > RF.dimension15,
    > RF.financial_event_type,
    > RF.functional_currency_code,
    > RF.func_currency_amt_sum,
    > RF.func_currency_balance,
    > RF.func_currency_beg_balance,
    > RF.func_journal_balance,
    > RF.func_journal_beg_balance,
    > RF.func_journal_sum,
    > RF.gl_group_id,
    > RF.gl_product_line,
    > RF.jl_description,
    > RF.ledger_id,
    > RF.local_currency_amt_sum,
    > RF.local_currency_balance,
    > RF.local_currency_beg_balance,
    > RF.local_currency_code ,
    > RF.local_journal_balance,
    > RF.local_journal_beg_balance,
    > RF.local_journal_sum,
    > RF.location,
    > RF.ltd_amortization_amount,
    > RF.post_to_gl,
    > RF.principal_amount,
    > RF.project,
    > RF.quantity_sum,
    > RF.rfd_id,
    > RF.sales_channel,
    > RF.sl_db_name,
    > RF.source_system,
    > RF.timezone_id
    > FROM oldest_act_dt_previous_day_rf RF, date_range D;
Warning: Map Join MAPJOIN[7][bigTable=rf] in task 'Stage-4:MAPRED' is a cross product
Query ID = hadoop_20160630181616_804cc05b-8103-45a9-93ff-4c258e995d58
Total jobs = 1
Execution log at: /mnt/tmp/hadoop/hadoop_20160630181616_804cc05b-8103-45a9-93ff-4c258e995d58.log
2016-06-30 06:17:04    Starting to launch local task to process map join;     maximum memory
= 932184064
2016-06-30 06:17:06    Dump the side-table for tag: 1 with group count: 1 into file: file:/mnt/tmp/hadoop/57e52e6d-3178-4e8e-be7f-101939fe11c2/hive_2016-06-30_18-16-59_589_3295892868228064144-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile11--.hashtable
2016-06-30 06:17:06    Uploaded 1 File to: file:/mnt/tmp/hadoop/57e52e6d-3178-4e8e-be7f-101939fe11c2/hive_2016-06-30_18-16-59_589_3295892868228064144-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile11--.hashtable
(931 bytes)
2016-06-30 06:17:06    End of local task; Time Taken: 1.341 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1467163153575_0027, Tracking URL = http://ip-172-31-1-35.ec2.internal:20888/proxy/application_1467163153575_0027/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1467163153575_0027
Hadoop job information for Stage-4: number of mappers: 324; number of reducers: 0
2016-06-30 18:17:15,415 Stage-4 map = 0%,  reduce = 0%

hive> EXPLAIN EXTENDED SELECT
    > D.caldate as accounting_date_local,
    > RF.account_number as account_number,
    > RF.aggregation_type,
    > RF.application_name,
    > RF.company_code,
    > RF.cost_center,
    > RF.dimension01,
    > RF.dimension02,
    > RF.dimension03,
    > RF.dimension04,
    > RF.dimension05,
    > RF.dimension06,
    > RF.dimension07,
    > RF.dimension08,
    > RF.dimension09,
    > RF.dimension10,
    > RF.dimension11,
    > RF.dimension12,
    > RF.dimension13,
    > RF.dimension14,
    > RF.dimension15,
    > RF.financial_event_type,
    > RF.functional_currency_code,
    > RF.func_currency_amt_sum,
    > RF.func_currency_balance,
    > RF.func_currency_beg_balance,
    > RF.func_journal_balance,
    > RF.func_journal_beg_balance,
    > RF.func_journal_sum,
    > RF.gl_group_id,
    > RF.gl_product_line,
    > RF.jl_description,
    > RF.ledger_id,
    > RF.local_currency_amt_sum,
    > RF.local_currency_balance,
    > RF.local_currency_beg_balance,
    > RF.local_currency_code ,
    > RF.local_journal_balance,
    > RF.local_journal_beg_balance,
    > RF.local_journal_sum,
    > RF.location,
    > RF.ltd_amortization_amount,
    > RF.post_to_gl,
    > RF.principal_amount,
    > RF.project,
    > RF.quantity_sum,
    > RF.rfd_id,
    > RF.sales_channel,
    > RF.sl_db_name,
    > RF.source_system,
    > RF.timezone_id
    > FROM oldest_act_dt_previous_day_rf RF, date_range D;
Warning: Map Join MAPJOIN[7][bigTable=rf] in task 'Map 2' is a cross product
OK
ABSTRACT SYNTAX TREE:

TOK_QUERY
   TOK_FROM
      TOK_JOIN
         TOK_TABREF
            TOK_TABNAME
               oldest_act_dt_previous_day_rf
            RF
         TOK_TABREF
            TOK_TABNAME
               date_range
            D
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  D
               caldate
            accounting_date_local
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               account_number
            account_number
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               aggregation_type
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               application_name
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               company_code
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               cost_center
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension01
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension02
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension03
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension04
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension05
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension06
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension07
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension08
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension09
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension10
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension11
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension12
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension13
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension14
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               dimension15
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               financial_event_type
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               functional_currency_code
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_currency_amt_sum
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_currency_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_currency_beg_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_journal_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_journal_beg_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               func_journal_sum
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               gl_group_id
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               gl_product_line
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               jl_description
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               ledger_id
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_currency_amt_sum
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_currency_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_currency_beg_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_currency_code
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_journal_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_journal_beg_balance
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               local_journal_sum
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               location
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               ltd_amortization_amount
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               post_to_gl
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               principal_amount
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               project
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               quantity_sum
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               rfd_id
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               sales_channel
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               sl_db_name
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               source_system
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  RF
               timezone_id


STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      Edges:
        Map 2 <- Map 1 (BROADCAST_EDGE)
      DagName: hadoop_20160630184242_e2719632-5ad7-4e8c-a92d-ea74daa27abb:2
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: d
                  Statistics: Num rows: 41 Data size: 3854 Basic stats: COMPLETE Column stats:
NONE
                  GatherStats: false
                  Reduce Output Operator
                    sort order:
                    Statistics: Num rows: 41 Data size: 3854 Basic stats: COMPLETE Column
stats: NONE
                    tag: 1
                    value expressions: caldate (type: string)
                    auto parallelism: false
            Path -> Alias:
              hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/date_range [d]
            Path -> Partition:
              hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/date_range
                Partition
                  base file name: date_range
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE true
                    bucket_count -1
                    columns caldate
                    columns.comments
                    columns.types string
                    file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    location hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/date_range
                    name default.date_range
                    numFiles 1
                    numRows 41
                    rawDataSize 3854
                    serialization.ddl struct date_range { string caldate}
                    serialization.format 1
                    serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    totalSize 350
                    transient_lastDdlTime 1467166854
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde

                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE true
                      bucket_count -1
                      columns caldate
                      columns.comments
                      columns.types string
                      file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      location hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/date_range
                      name default.date_range
                      numFiles 1
                      numRows 41
                      rawDataSize 3854
                      serialization.ddl struct date_range { string caldate}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      totalSize 350
                      transient_lastDdlTime 1467166854
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: default.date_range
                  name: default.date_range
            Truncated Path -> Alias:
              /date_range [d]
            Execution mode: vectorized
        Map 2
            Map Operator Tree:
                TableScan
                  alias: rf
                  Statistics: Num rows: 87435021 Data size: 343639889520 Basic stats: COMPLETE
Column stats: NONE
                  GatherStats: false
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {account_number} {aggregation_type} {application_name} {company_code}
{cost_center} {dimension01} {dimension02} {dimension03} {dimension04} {dimension05} {dimension06}
{dimension07} {dimension08} {dimension09} {dimension10} {dimension11} {dimension12} {dimension13}
{dimension14} {dimension15} {financial_event_type} {functional_currency_code} {func_currency_amt_sum}
{func_currency_balance} {func_currency_beg_balance} {func_journal_balance} {func_journal_beg_balance}
{func_journal_sum} {gl_group_id} {gl_product_line} {jl_description} {ledger_id} {local_currency_amt_sum}
{local_currency_balance} {local_currency_beg_balance} {local_currency_code} {local_journal_balance}
{local_journal_beg_balance} {local_journal_sum} {location} {ltd_amortization_amount} {post_to_gl}
{principal_amount} {project} {quantity_sum} {rfd_id} {sales_channel} {sl_db_name} {source_system}
{timezone_id}
                      1 {caldate}
                    Estimated key counts: Map 1 => 41
                    keys:
                      0
                      1
                    outputColumnNames: _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8,
_col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20,
_col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32,
_col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40, _col41, _col42, _col43, _col44,
_col45, _col46, _col47, _col48, _col49, _col50, _col54
                    input vertices:
                      1 Map 1
                    Position of Big Table: 0
                    Statistics: Num rows: 113665523 Data size: 446731839989 Basic stats: COMPLETE
Column stats: NONE
                    Select Operator
                      expressions: _col54 (type: string), _col1 (type: string), _col2 (type:
string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string),
_col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11
(type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15
(type: string), _col16 (type: string), _col17 (type: string), _col18 (type: string), _col19
(type: string), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23
(type: decimal(38,18)), _col24 (type: decimal(38,18)), _col25 (type: decimal(38,18)), _col26
(type: decimal(38,18)), _col27 (type: decimal(38,18)), _col28 (type: decimal(38,18)), _col29
(type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), _col33
(type: decimal(38,18)), _col34 (type: decimal(38,18)), _col35 (type: decimal(38,18)), _col36
(type: string), _col37 (type: decimal(38,18)), _col38 (type: decimal(38,18)), _col39 (type:
decimal(38,18)), _col40 (type: string), _col41 (type: decimal(38,18)), _col42 (type: string),
_col43 (type: decimal(38,18)), _col44 (type: string), _col45 (type: decimal(38,18)), _col46
(type: string), _col47 (type: string), _col48 (type: string), _col49 (type: string), _col50
(type: string)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6,
_col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18,
_col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30,
_col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40, _col41, _col42,
_col43, _col44, _col45, _col46, _col47, _col48, _col49, _col50
                      Statistics: Num rows: 113665523 Data size: 446731839989 Basic stats:
COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        GlobalTableId: 0
                        directory: hdfs://ip-172-31-1-35.ec2.internal:8020/tmp/hive/hadoop/daa51bd7-fade-41f1-b96d-e641f35e1bfc/hive_2016-06-30_18-42-00_307_2080077224343537923-1/-ext-10001
                        NumFilesPerFileSink: 1
                        Statistics: Num rows: 113665523 Data size: 446731839989 Basic stats:
COMPLETE Column stats: NONE
                        Stats Publishing Key Prefix: hdfs://ip-172-31-1-35.ec2.internal:8020/tmp/hive/hadoop/daa51bd7-fade-41f1-b96d-e641f35e1bfc/hive_2016-06-30_18-42-00_307_2080077224343537923-1/-ext-10001/
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            properties:
                              columns _col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13,_col14,_col15,_col16,_col17,_col18,_col19,_col20,_col21,_col22,_col23,_col24,_col25,_col26,_col27,_col28,_col29,_col30,_col31,_col32,_col33,_col34,_col35,_col36,_col37,_col38,_col39,_col40,_col41,_col42,_col43,_col44,_col45,_col46,_col47,_col48,_col49,_col50
                              columns.types string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:string:string:string:string
                              escape.delim \
                              hive.serialization.extend.nesting.levels true
                              serialization.format 1
                              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                        TotalFiles: 1
                        GatherStats: false
                        MultiFileSpray: false
            Path -> Alias:
              hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/oldest_act_dt_previous_day_rf
[rf]
            Path -> Partition:
              hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/oldest_act_dt_previous_day_rf
                Partition
                  base file name: oldest_act_dt_previous_day_rf
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE true
                    bucket_count -1
                    columns accounting_date_local,account_number,aggregation_type,application_name,company_code,cost_center,dimension01,dimension02,dimension03,dimension04,dimension05,dimension06,dimension07,dimension08,dimension09,dimension10,dimension11,dimension12,dimension13,dimension14,dimension15,financial_event_type,functional_currency_code,func_currency_amt_sum,func_currency_balance,func_currency_beg_balance,func_journal_balance,func_journal_beg_balance,func_journal_sum,gl_group_id,gl_product_line,jl_description,ledger_id,local_currency_amt_sum,local_currency_balance,local_currency_beg_balance,local_currency_code,local_journal_balance,local_journal_beg_balance,local_journal_sum,location,ltd_amortization_amount,post_to_gl,principal_amount,project,quantity_sum,rfd_id,sales_channel,sl_db_name,source_system,timezone_id
                    columns.comments
                    columns.types timestamp:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:string:string:string:string
                    file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    location hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/oldest_act_dt_previous_day_rf
                    name default.oldest_act_dt_previous_day_rf
                    numFiles 324
                    numRows 87435021
                    rawDataSize 343639889520
                    serialization.ddl struct oldest_act_dt_previous_day_rf { timestamp accounting_date_local,
string account_number, string aggregation_type, string application_name, string company_code,
string cost_center, string dimension01, string dimension02, string dimension03, string dimension04,
string dimension05, string dimension06, string dimension07, string dimension08, string dimension09,
string dimension10, string dimension11, string dimension12, string dimension13, string dimension14,
string dimension15, string financial_event_type, string functional_currency_code, decimal(38,18)
func_currency_amt_sum, decimal(38,18) func_currency_balance, decimal(38,18) func_currency_beg_balance,
decimal(38,18) func_journal_balance, decimal(38,18) func_journal_beg_balance, decimal(38,18)
func_journal_sum, string gl_group_id, string gl_product_line, string jl_description, string
ledger_id, decimal(38,18) local_currency_amt_sum, decimal(38,18) local_currency_balance, decimal(38,18)
local_currency_beg_balance, string local_currency_code, decimal(38,18) local_journal_balance,
decimal(38,18) local_journal_beg_balance, decimal(38,18) local_journal_sum, string location,
decimal(38,18) ltd_amortization_amount, string post_to_gl, decimal(38,18) principal_amount,
string project, decimal(38,18) quantity_sum, string rfd_id, string sales_channel, string sl_db_name,
string source_system, string timezone_id}
                    serialization.format 1
                    serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    totalSize 4575811884
                    transient_lastDdlTime 1467309711
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde

                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE true
                      bucket_count -1
                      columns accounting_date_local,account_number,aggregation_type,application_name,company_code,cost_center,dimension01,dimension02,dimension03,dimension04,dimension05,dimension06,dimension07,dimension08,dimension09,dimension10,dimension11,dimension12,dimension13,dimension14,dimension15,financial_event_type,functional_currency_code,func_currency_amt_sum,func_currency_balance,func_currency_beg_balance,func_journal_balance,func_journal_beg_balance,func_journal_sum,gl_group_id,gl_product_line,jl_description,ledger_id,local_currency_amt_sum,local_currency_balance,local_currency_beg_balance,local_currency_code,local_journal_balance,local_journal_beg_balance,local_journal_sum,location,ltd_amortization_amount,post_to_gl,principal_amount,project,quantity_sum,rfd_id,sales_channel,sl_db_name,source_system,timezone_id
                      columns.comments
                      columns.types timestamp:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):decimal(38,18):string:string:string:string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):decimal(38,18):decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:decimal(38,18):string:string:string:string:string
                      file.inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      location hdfs://ip-172-31-1-35.ec2.internal:8020/user/hive/warehouse/oldest_act_dt_previous_day_rf
                      name default.oldest_act_dt_previous_day_rf
                      numFiles 324
                      numRows 87435021
                      rawDataSize 343639889520
                      serialization.ddl struct oldest_act_dt_previous_day_rf { timestamp accounting_date_local,
string account_number, string aggregation_type, string application_name, string company_code,
string cost_center, string dimension01, string dimension02, string dimension03, string dimension04,
string dimension05, string dimension06, string dimension07, string dimension08, string dimension09,
string dimension10, string dimension11, string dimension12, string dimension13, string dimension14,
string dimension15, string financial_event_type, string functional_currency_code, decimal(38,18)
func_currency_amt_sum, decimal(38,18) func_currency_balance, decimal(38,18) func_currency_beg_balance,
decimal(38,18) func_journal_balance, decimal(38,18) func_journal_beg_balance, decimal(38,18)
func_journal_sum, string gl_group_id, string gl_product_line, string jl_description, string
ledger_id, decimal(38,18) local_currency_amt_sum, decimal(38,18) local_currency_balance, decimal(38,18)
local_currency_beg_balance, string local_currency_code, decimal(38,18) local_journal_balance,
decimal(38,18) local_journal_beg_balance, decimal(38,18) local_journal_sum, string location,
decimal(38,18) ltd_amortization_amount, string post_to_gl, decimal(38,18) principal_amount,
string project, decimal(38,18) quantity_sum, string rfd_id, string sales_channel, string sl_db_name,
string source_system, string timezone_id}
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      totalSize 4575811884
                      transient_lastDdlTime 1467309711
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: default.oldest_act_dt_previous_day_rf
                  name: default.oldest_act_dt_previous_day_rf
            Truncated Path -> Alias:
              /oldest_act_dt_previous_day_rf [rf]
            Execution mode: vectorized

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink



Mime
View raw message