hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13837) current_timestamp() output format is different in some cases
Date Tue, 24 May 2016 23:14:14 GMT
Pengcheng Xiong created HIVE-13837:
--------------------------------------

             Summary: current_timestamp() output format is different in some cases
                 Key: HIVE-13837
                 URL: https://issues.apache.org/jira/browse/HIVE-13837
             Project: Hive
          Issue Type: Bug
            Reporter: Pengcheng Xiong
            Assignee: Pengcheng Xiong


As [~jdere] reports:
{code}
current_timestamp() udf returns result with different format in some cases.

select current_timestamp() returns result with decimal precision:
{noformat}
hive> select current_timestamp();
OK
2016-04-14 18:26:58.875
Time taken: 0.077 seconds, Fetched: 1 row(s)
{noformat}

But output format is different for select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5; 
{noformat}
hive> select current_timestamp() from all100k union select current_timestamp() from over100k
limit 5;
Query ID = hrt_qa_20160414182956_c4ed48f2-9913-4b3b-8f09-668ebf55b3e3
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id application_1460611908643_0624)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
 
----------------------------------------------------------------------------------------------
Map 1 ..........      llap     SUCCEEDED      1          1        0        0       0     
 0  
Map 4 ..........      llap     SUCCEEDED      1          1        0        0       0     
 0  
Reducer 3 ......      llap     SUCCEEDED      1          1        0        0       0     
 0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.92 s     
----------------------------------------------------------------------------------------------
OK
2016-04-14 18:29:56
Time taken: 10.558 seconds, Fetched: 1 row(s)
{noformat}

explain plan for select current_timestamp();
{noformat}
hive> explain extended select current_timestamp();
OK
ABSTRACT SYNTAX TREE:
  
TOK_QUERY
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            TOK_FUNCTION
               current_timestamp


STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: _dummy_table
          Row Limit Per Split: 1
          GatherStats: false
          Select Operator
            expressions: 2016-04-14 18:30:57.206 (type: timestamp)
            outputColumnNames: _col0
            ListSink

Time taken: 0.062 seconds, Fetched: 30 row(s)
{noformat}

explain plan for select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5;
{noformat}
hive> explain extended select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5;
OK
ABSTRACT SYNTAX TREE:
  
TOK_QUERY
   TOK_FROM
      TOK_SUBQUERY
         TOK_QUERY
            TOK_FROM
               TOK_SUBQUERY
                  TOK_UNIONALL
                     TOK_QUERY
                        TOK_FROM
                           TOK_TABREF
                              TOK_TABNAME
                                 all100k
                        TOK_INSERT
                           TOK_DESTINATION
                              TOK_DIR
                                 TOK_TMP_FILE
                           TOK_SELECT
                              TOK_SELEXPR
                                 TOK_FUNCTION
                                    current_timestamp
                     TOK_QUERY
                        TOK_FROM
                           TOK_TABREF
                              TOK_TABNAME
                                 over100k
                        TOK_INSERT
                           TOK_DESTINATION
                              TOK_DIR
                                 TOK_TMP_FILE
                           TOK_SELECT
                              TOK_SELEXPR
                                 TOK_FUNCTION
                                    current_timestamp
                  _u1
            TOK_INSERT
               TOK_DESTINATION
                  TOK_DIR
                     TOK_TMP_FILE
               TOK_SELECTDI
                  TOK_SELEXPR
                     TOK_ALLCOLREF
         _u2
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            TOK_ALLCOLREF
      TOK_LIMIT
         5


STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
      DagId: hrt_qa_20160414183119_ec8e109e-8975-4799-a142-4a2289f85910:7
      Edges:
        Map 1 <- Union 2 (CONTAINS)
        Map 4 <- Union 2 (CONTAINS)
        Reducer 3 <- Union 2 (SIMPLE_EDGE)
      DagName: 
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: all100k
                  Statistics: Num rows: 100000 Data size: 15801336 Basic stats: COMPLETE Column
stats: COMPLETE
                  GatherStats: false
                  Select Operator
                    Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: 2016-04-14 18:31:19.0 (type: timestamp)
                      outputColumnNames: _col0
                      Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE
Column stats: COMPLETE
                      Group By Operator
                        keys: _col0 (type: timestamp)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: timestamp)
                          null sort order: a
                          sort order: +
                          Map-reduce partition columns: _col0 (type: timestamp)
                          Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
                          tag: -1
                          TopN: 5
                          TopN Hash Memory Usage: 0.04
                          auto parallelism: true
            Execution mode: llap
            LLAP IO: no inputs
            Path -> Alias:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
            Path -> Partition:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k 
                Partition
                  base file name: all100k
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
                    EXTERNAL TRUE
                    bucket_count -1
                    columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
                    columns.comments 
                    columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
                    field.delim |
                    file.inputformat org.apache.hadoop.mapred.TextInputFormat
                    file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
                    name default.all100k
                    numFiles 1
                    numRows 100000
                    rawDataSize 15801336
                    serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float
f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts,
date dt}
                    serialization.format |
                    serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 15901336
                    transient_lastDdlTime 1460612683
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
                      EXTERNAL TRUE
                      bucket_count -1
                      columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
                      columns.comments 
                      columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
                      field.delim |
                      file.inputformat org.apache.hadoop.mapred.TextInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
                      name default.all100k
                      numFiles 1
                      numRows 100000
                      rawDataSize 15801336
                      serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float
f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts,
date dt}
                      serialization.format |
                      serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 15901336
                      transient_lastDdlTime 1460612683
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.all100k
                  name: default.all100k
            Truncated Path -> Alias:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: over100k
                  Statistics: Num rows: 100000 Data size: 6631229 Basic stats: COMPLETE Column
stats: COMPLETE
                  GatherStats: false
                  Select Operator
                    Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE
Column stats: COMPLETE
                    Select Operator
                      expressions: 2016-04-14 18:31:19.0 (type: timestamp)
                      outputColumnNames: _col0
                      Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE
Column stats: COMPLETE
                      Group By Operator
                        keys: _col0 (type: timestamp)
                        mode: hash
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: timestamp)
                          null sort order: a
                          sort order: +
                          Map-reduce partition columns: _col0 (type: timestamp)
                          Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
                          tag: -1
                          TopN: 5
                          TopN Hash Memory Usage: 0.04
                          auto parallelism: true
            Execution mode: llap
            LLAP IO: no inputs
            Path -> Alias:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
            Path -> Partition:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k 
                Partition
                  base file name: over100k
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  properties:
                    COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
                    EXTERNAL TRUE
                    bucket_count -1
                    columns t,si,i,b,f,d,bo,s,bin
                    columns.comments 
                    columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
                    field.delim :
                    file.inputformat org.apache.hadoop.mapred.TextInputFormat
                    file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
                    name default.over100k
                    numFiles 1
                    numRows 100000
                    rawDataSize 6631229
                    serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float
f, double d, bool bo, string s, binary bin}
                    serialization.format :
                    serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    totalSize 6731229
                    transient_lastDdlTime 1460612798
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
                      EXTERNAL TRUE
                      bucket_count -1
                      columns t,si,i,b,f,d,bo,s,bin
                      columns.comments 
                      columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
                      field.delim :
                      file.inputformat org.apache.hadoop.mapred.TextInputFormat
                      file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
                      name default.over100k
                      numFiles 1
                      numRows 100000
                      rawDataSize 6631229
                      serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float
f, double d, bool bo, string s, binary bin}
                      serialization.format :
                      serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      totalSize 6731229
                      transient_lastDdlTime 1460612798
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: default.over100k
                  name: default.over100k
            Truncated Path -> Alias:
              hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
        Reducer 3 
            Execution mode: vectorized, llap
            Needs Tagging: false
            Reduce Operator Tree:
              Group By Operator
                keys: KEY._col0 (type: timestamp)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats:
COMPLETE
                Limit
                  Number of rows: 5
                  Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats:
COMPLETE
                  File Output Operator
                    compressed: false
                    GlobalTableId: 0
                    directory: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002
                    NumFilesPerFileSink: 1
                    Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats:
COMPLETE
                    Stats Publishing Key Prefix: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002/
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        properties:
                          columns _col0
                          columns.types timestamp
                          escape.delim \
                          hive.serialization.extend.additional.nesting.levels true
                          serialization.escape.crlf true
                          serialization.format 1
                          serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    TotalFiles: 1
                    GatherStats: false
                    MultiFileSpray: false
        Union 2 
            Vertex: Union 2

  Stage: Stage-0
    Fetch Operator
      limit: 5
      Processor Tree:
        ListSink

Time taken: 0.301 seconds, Fetched: 284 row(s)
{noformat}

Both the queries used return timestamp with YYYY-MM-DD HH:MM:SS.fff format in past releases.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message