hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: Convert date in string format to timestamp in table definition
Date Sun, 05 Jun 2016 09:12:02 GMT
‘Never’ is a strong word.


1.       We’re talking about the metadata so –



a.       The data format is irrelevant



b.      The records number is small (scale of thousands)

I would have sacrificed 1 second of metadata processing for a better user experience



2.       Partitions values are being held in the metastore (at least with MySQL)  as strings

Dudu

From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Sunday, June 05, 2016 11:38 AM
To: user@hive.apache.org
Subject: Re: Convert date in string format to timestamp in table definition

Never use string when you can use int - the performance will be much better - especially for
tables in Orc / parquet format

On 04 Jun 2016, at 22:31, Igor Kravzov <igork.inexso@gmail.com<mailto:igork.inexso@gmail.com>>
wrote:
Thanks Dudu.
So if I need actual date I will use view.
Regarding partition column:  I can create 2 external tables based on the same data with integer
or string column partition and see which one is more convenient for our use.

On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dmarkovitz@paypal.com<mailto:dmarkovitz@paypal.com>>
wrote:
I’m not aware of an option to do what you request in the external table definition but you
might want to that using a view.

P.s.
I seems to me that defining the partition column as a string would be more user friendly than
integer, e.g. –

select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;





$ hdfs dfs -ls -R /tmp/threads_test
drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
-rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt

$ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
{"url":"www.blablabla.com<http://www.blablabla.com>","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}

----------------------------------------------------------------------------------------------------


hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;

hive>
create external table threads_test
(
    url         string
   ,pagetype    string
   ,adddate     string
   ,postdate    string
   ,posttext    string
)
    partitioned by (yyyymmdd string)
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    location '/tmp/threads_test'
;

hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';

hive> select * from threads_test;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55
 YadaYada      20160604

hive>
create view threads_test_v
as
select      url
           ,pagetype
           ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)
 as adddate
           ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)
 as postdate
           ,posttext

from        threads_test
;

hive> select * from threads_test_v;

www.blablabla.com<http://www.blablabla.com>    pg1    2016-05-17 02:10:44.527    2016-05-16
02:08:55  YadaYada


From: Igor Kravzov [mailto:igork.inexso@gmail.com<mailto:igork.inexso@gmail.com>]
Sent: Saturday, June 04, 2016 8:13 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Convert date in string format to timestamp in table definition

Hi,

I have 2 dates in Json file defined like this
"addDate": "2016-05-17T02:10:44.527",
  "postDate": "2016-05-16T02:08:55",

Right now I define external table based on this file like this:
CREATE external TABLE threads_test
(url string,
 pagetype string,
 adddate string,
 postdate string,
 posttext string)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location 'my location';

is it possible to define these 2 dates as timestamp?
Do I need to change date format in the file? is it possible to specify date format in table
definition?
Or I better off with string?

Thanks in advance.

Mime
View raw message