hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13837) current_timestamp() output format is different in some cases
Date Tue, 24 May 2016 23:16:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299132#comment-15299132
] 

Pengcheng Xiong commented on HIVE-13837:
----------------------------------------

[~jdere], could u please review? Thanks.

> current_timestamp() output format is different in some cases
> ------------------------------------------------------------
>
>                 Key: HIVE-13837
>                 URL: https://issues.apache.org/jira/browse/HIVE-13837
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-13837.01.patch
>
>
> As [~jdere] reports:
> {code}
> current_timestamp() udf returns result with different format in some cases.
> select current_timestamp() returns result with decimal precision:
> {noformat}
> hive> select current_timestamp();
> OK
> 2016-04-14 18:26:58.875
> Time taken: 0.077 seconds, Fetched: 1 row(s)
> {noformat}
> But output format is different for select current_timestamp() from all100k union select
current_timestamp() from over100k limit 5; 
> {noformat}
> hive> select current_timestamp() from all100k union select current_timestamp() from
over100k limit 5;
> Query ID = hrt_qa_20160414182956_c4ed48f2-9913-4b3b-8f09-668ebf55b3e3
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
> Status: Running (Executing on YARN cluster with App id application_1460611908643_0624)
> ----------------------------------------------------------------------------------------------
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
 KILLED  
> ----------------------------------------------------------------------------------------------
> Map 1 ..........      llap     SUCCEEDED      1          1        0        0       0
      0  
> Map 4 ..........      llap     SUCCEEDED      1          1        0        0       0
      0  
> Reducer 3 ......      llap     SUCCEEDED      1          1        0        0       0
      0  
> ----------------------------------------------------------------------------------------------
> VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.92 s    

> ----------------------------------------------------------------------------------------------
> OK
> 2016-04-14 18:29:56
> Time taken: 10.558 seconds, Fetched: 1 row(s)
> {noformat}
> explain plan for select current_timestamp();
> {noformat}
> hive> explain extended select current_timestamp();
> OK
> ABSTRACT SYNTAX TREE:
>   
> TOK_QUERY
>    TOK_INSERT
>       TOK_DESTINATION
>          TOK_DIR
>             TOK_TMP_FILE
>       TOK_SELECT
>          TOK_SELEXPR
>             TOK_FUNCTION
>                current_timestamp
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         TableScan
>           alias: _dummy_table
>           Row Limit Per Split: 1
>           GatherStats: false
>           Select Operator
>             expressions: 2016-04-14 18:30:57.206 (type: timestamp)
>             outputColumnNames: _col0
>             ListSink
> Time taken: 0.062 seconds, Fetched: 30 row(s)
> {noformat}
> explain plan for select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5;
> {noformat}
> hive> explain extended select current_timestamp() from all100k union select current_timestamp()
from over100k limit 5;
> OK
> ABSTRACT SYNTAX TREE:
>   
> TOK_QUERY
>    TOK_FROM
>       TOK_SUBQUERY
>          TOK_QUERY
>             TOK_FROM
>                TOK_SUBQUERY
>                   TOK_UNIONALL
>                      TOK_QUERY
>                         TOK_FROM
>                            TOK_TABREF
>                               TOK_TABNAME
>                                  all100k
>                         TOK_INSERT
>                            TOK_DESTINATION
>                               TOK_DIR
>                                  TOK_TMP_FILE
>                            TOK_SELECT
>                               TOK_SELEXPR
>                                  TOK_FUNCTION
>                                     current_timestamp
>                      TOK_QUERY
>                         TOK_FROM
>                            TOK_TABREF
>                               TOK_TABNAME
>                                  over100k
>                         TOK_INSERT
>                            TOK_DESTINATION
>                               TOK_DIR
>                                  TOK_TMP_FILE
>                            TOK_SELECT
>                               TOK_SELEXPR
>                                  TOK_FUNCTION
>                                     current_timestamp
>                   _u1
>             TOK_INSERT
>                TOK_DESTINATION
>                   TOK_DIR
>                      TOK_TMP_FILE
>                TOK_SELECTDI
>                   TOK_SELEXPR
>                      TOK_ALLCOLREF
>          _u2
>    TOK_INSERT
>       TOK_DESTINATION
>          TOK_DIR
>             TOK_TMP_FILE
>       TOK_SELECT
>          TOK_SELEXPR
>             TOK_ALLCOLREF
>       TOK_LIMIT
>          5
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: hrt_qa_20160414183119_ec8e109e-8975-4799-a142-4a2289f85910:7
>       Edges:
>         Map 1 <- Union 2 (CONTAINS)
>         Map 4 <- Union 2 (CONTAINS)
>         Reducer 3 <- Union 2 (SIMPLE_EDGE)
>       DagName: 
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: all100k
>                   Statistics: Num rows: 100000 Data size: 15801336 Basic stats: COMPLETE
Column stats: COMPLETE
>                   GatherStats: false
>                   Select Operator
>                     Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: 2016-04-14 18:31:19.0 (type: timestamp)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE
Column stats: COMPLETE
>                       Group By Operator
>                         keys: _col0 (type: timestamp)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: timestamp)
>                           null sort order: a
>                           sort order: +
>                           Map-reduce partition columns: _col0 (type: timestamp)
>                           Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE
Column stats: COMPLETE
>                           tag: -1
>                           TopN: 5
>                           TopN Hash Memory Usage: 0.04
>                           auto parallelism: true
>             Execution mode: llap
>             LLAP IO: no inputs
>             Path -> Alias:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
[all100k]
>             Path -> Partition:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k

>                 Partition
>                   base file name: all100k
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                   properties:
>                     COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
>                     EXTERNAL TRUE
>                     bucket_count -1
>                     columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
>                     columns.comments 
>                     columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
>                     field.delim |
>                     file.inputformat org.apache.hadoop.mapred.TextInputFormat
>                     file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                     location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
>                     name default.all100k
>                     numFiles 1
>                     numRows 100000
>                     rawDataSize 15801336
>                     serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b,
float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp
ts, date dt}
>                     serialization.format |
>                     serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     totalSize 15901336
>                     transient_lastDdlTime 1460612683
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                 
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                     properties:
>                       COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
>                       EXTERNAL TRUE
>                       bucket_count -1
>                       columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
>                       columns.comments 
>                       columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
>                       field.delim |
>                       file.inputformat org.apache.hadoop.mapred.TextInputFormat
>                       file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
>                       name default.all100k
>                       numFiles 1
>                       numRows 100000
>                       rawDataSize 15801336
>                       serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b,
float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp
ts, date dt}
>                       serialization.format |
>                       serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                       totalSize 15901336
>                       transient_lastDdlTime 1460612683
>                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     name: default.all100k
>                   name: default.all100k
>             Truncated Path -> Alias:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
[all100k]
>         Map 4 
>             Map Operator Tree:
>                 TableScan
>                   alias: over100k
>                   Statistics: Num rows: 100000 Data size: 6631229 Basic stats: COMPLETE
Column stats: COMPLETE
>                   GatherStats: false
>                   Select Operator
>                     Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE
Column stats: COMPLETE
>                     Select Operator
>                       expressions: 2016-04-14 18:31:19.0 (type: timestamp)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE
Column stats: COMPLETE
>                       Group By Operator
>                         keys: _col0 (type: timestamp)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
>                         Reduce Output Operator
>                           key expressions: _col0 (type: timestamp)
>                           null sort order: a
>                           sort order: +
>                           Map-reduce partition columns: _col0 (type: timestamp)
>                           Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE
Column stats: COMPLETE
>                           tag: -1
>                           TopN: 5
>                           TopN Hash Memory Usage: 0.04
>                           auto parallelism: true
>             Execution mode: llap
>             LLAP IO: no inputs
>             Path -> Alias:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
[over100k]
>             Path -> Partition:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k

>                 Partition
>                   base file name: over100k
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                   properties:
>                     COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
>                     EXTERNAL TRUE
>                     bucket_count -1
>                     columns t,si,i,b,f,d,bo,s,bin
>                     columns.comments 
>                     columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
>                     field.delim :
>                     file.inputformat org.apache.hadoop.mapred.TextInputFormat
>                     file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                     location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
>                     name default.over100k
>                     numFiles 1
>                     numRows 100000
>                     rawDataSize 6631229
>                     serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b,
float f, double d, bool bo, string s, binary bin}
>                     serialization.format :
>                     serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     totalSize 6731229
>                     transient_lastDdlTime 1460612798
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                 
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                     properties:
>                       COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
>                       EXTERNAL TRUE
>                       bucket_count -1
>                       columns t,si,i,b,f,d,bo,s,bin
>                       columns.comments 
>                       columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
>                       field.delim :
>                       file.inputformat org.apache.hadoop.mapred.TextInputFormat
>                       file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
>                       name default.over100k
>                       numFiles 1
>                       numRows 100000
>                       rawDataSize 6631229
>                       serialization.ddl struct over100k { byte t, i16 si, i32 i, i64
b, float f, double d, bool bo, string s, binary bin}
>                       serialization.format :
>                       serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                       totalSize 6731229
>                       transient_lastDdlTime 1460612798
>                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     name: default.over100k
>                   name: default.over100k
>             Truncated Path -> Alias:
>               hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
[over100k]
>         Reducer 3 
>             Execution mode: vectorized, llap
>             Needs Tagging: false
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: timestamp)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats:
COMPLETE
>                 Limit
>                   Number of rows: 5
>                   Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
>                   File Output Operator
>                     compressed: false
>                     GlobalTableId: 0
>                     directory: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002
>                     NumFilesPerFileSink: 1
>                     Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column
stats: COMPLETE
>                     Stats Publishing Key Prefix: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002/
>                     table:
>                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         properties:
>                           columns _col0
>                           columns.types timestamp
>                           escape.delim \
>                           hive.serialization.extend.additional.nesting.levels true
>                           serialization.escape.crlf true
>                           serialization.format 1
>                           serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     TotalFiles: 1
>                     GatherStats: false
>                     MultiFileSpray: false
>         Union 2 
>             Vertex: Union 2
>   Stage: Stage-0
>     Fetch Operator
>       limit: 5
>       Processor Tree:
>         ListSink
> Time taken: 0.301 seconds, Fetched: 284 row(s)
> {noformat}
> Both the queries used return timestamp with YYYY-MM-DD HH:MM:SS.fff format in past releases.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message