hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Convert date in string format to timestamp in table definition
Date Sun, 05 Jun 2016 08:37:49 GMT
Never use string when you can use int - the performance will be much better - especially for
tables in Orc / parquet format 

> On 04 Jun 2016, at 22:31, Igor Kravzov <igork.inexso@gmail.com> wrote:
> 
> Thanks Dudu. 
> So if I need actual date I will use view.
> Regarding partition column:  I can create 2 external tables based on the same data with
integer or string column partition and see which one is more convenient for our use.
> 
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dmarkovitz@paypal.com> wrote:
>> I’m not aware of an option to do what you request in the external table definition
but you might want to that using a view.
>> 
>>  
>> 
>> P.s.
>> 
>> I seems to me that defining the partition column as a string would be more user friendly
than integer, e.g. –
>> 
>>  
>> 
>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>> 
>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>> 
>> select * from threads_test where yyyymmdd like ‘______01’ -- first of every month;
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> $ hdfs dfs -ls -R /tmp/threads_test
>> 
>> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45 /tmp/threads_test/20160604
>> 
>> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45 /tmp/threads_test/20160604/data.txt
>> 
>>  
>> 
>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>> 
>> {"url":"www.blablabla.com","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>> 
>>  
>> 
>> ----------------------------------------------------------------------------------------------------
>> 
>>  
>> 
>>  
>> 
>> hive> add jar /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>> 
>>  
>> 
>> hive>
>> 
>> create external table threads_test
>> 
>> (
>> 
>>     url         string
>> 
>>    ,pagetype    string
>> 
>>    ,adddate     string
>> 
>>    ,postdate    string
>> 
>>    ,posttext    string
>> 
>> )
>> 
>>     partitioned by (yyyymmdd string)
>> 
>>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>> 
>>     location '/tmp/threads_test'
>> 
>> ;
>> 
>>  
>> 
>> hive> alter table threads_test add partition (yyyymmdd=20160604) location '/tmp/threads_test/20160604';
>> 
>>  
>> 
>> hive> select * from threads_test;
>> 
>>  
>> 
>> www.blablabla.com    pg1    2016-05-17T02:10:44.527    2016-05-16T02:08:55  YadaYada
     20160604
>> 
>>  
>> 
>> hive>
>> 
>> create view threads_test_v
>> 
>> as
>> 
>> select      url    
>> 
>>            ,pagetype
>> 
>>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate ,12)) as timestamp)
 as adddate 
>> 
>>            ,cast (concat_ws(' ',substr (postdate,1,10),substr (postdate,12)) as timestamp)
 as postdate 
>> 
>>            ,posttext
>> 
>>            
>> 
>> from        threads_test
>> 
>> ;
>> 
>>  
>> 
>> hive> select * from threads_test_v;
>> 
>>  
>> 
>> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16 02:08:55  YadaYada
>> 
>>  
>> 
>>  
>> 
>> From: Igor Kravzov [mailto:igork.inexso@gmail.com] 
>> Sent: Saturday, June 04, 2016 8:13 PM
>> To: user@hive.apache.org
>> Subject: Convert date in string format to timestamp in table definition
>> 
>>  
>> 
>> Hi,
>> 
>>  
>> 
>> I have 2 dates in Json file defined like this
>> 
>> "addDate": "2016-05-17T02:10:44.527",
>> 
>>   "postDate": "2016-05-16T02:08:55",
>> 
>>  
>> 
>> Right now I define external table based on this file like this:
>> 
>> CREATE external TABLE threads_test 
>> 
>> (url string,
>> 
>>  pagetype string,
>> 
>>  adddate string,
>> 
>>  postdate string,
>> 
>>  posttext string)
>> 
>> partitioned by (yyyymmdd int)
>> 
>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>> 
>> location 'my location';
>> 
>>  
>> 
>> is it possible to define these 2 dates as timestamp? 
>> 
>> Do I need to change date format in the file? is it possible to specify date format
in table definition?
>> 
>> Or I better off with string?
>> 
>>  
>> 
>> Thanks in advance.
>> 
> 

Mime
View raw message