hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <>
Subject RE: Convert date in string format to timestamp in table definition
Date Sun, 05 Jun 2016 09:12:02 GMT
‘Never’ is a strong word.

1.       We’re talking about the metadata so –

a.       The data format is irrelevant

b.      The records number is small (scale of thousands)

I would have sacrificed 1 second of metadata processing for a better user experience

2.       Partitions values are being held in the metastore (at least with MySQL)  as strings


From: Jörn Franke []
Sent: Sunday, June 05, 2016 11:38 AM
Subject: Re: Convert date in string format to timestamp in table definition

Never use string when you can use int - the performance will be much better - especially for
tables in Orc / parquet format

On 04 Jun 2016, at 22:31, Igor Kravzov <<>>
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column:  I can create 2 external tables based on the same data with integer
or string column partition and see which one is more convenient for our use.

On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <<>>
I’m not aware of an option to do what you request in the external table definition but you
might want to that using a view.

I seems to me that defining the partition column as a string would be more user friendly than
integer, e.g. –

select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;

$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt

$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt


hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

create external table threads_test
    url         string
   ,pagetype    string
   ,adddate     string
   ,postdate    string
   ,posttext    string
    partitioned by (yyyymmdd string)
    row format serde ''
    location '/tmp/threads_test'

hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';

hive> select * from threads_test;<>    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55
 YadaYada      20160604

create view threads_test_v
select      url
           ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)
 as adddate
           ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)
 as postdate

from        threads_test

hive> select * from threads_test_v;<>    pg1    2016-05-17 02:10:44.527    2016-05-16
02:08:55  YadaYada

From: Igor Kravzov [<>]
Sent: Saturday, June 04, 2016 8:13 PM
Subject: Convert date in string format to timestamp in table definition


I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table
Or I better off with string?

Thanks in advance.

View raw message