hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Kravzov <igork.ine...@gmail.com>
Subject Re: Convert date in string format to timestamp in table definition
Date Sun, 05 Jun 2016 01:09:25 GMT
Thanks Mich. This date is informational for now. Will see once I need it.

On Sat, Jun 4, 2016 at 5:23 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> or just create an internal table and do insert/select from external table
> to that table as Dudu mentioned
>
> hive> use test;
> OK
> hive> desc mytime;
> OK
> adddate                 timestamp
>
> hive> insert into
>     > test.mytime
>     > select cast(concat_ws(' ',substring
> ("2016-05-17T02:10:44.527",1,10),substring ("2016-05-17T02:10:44.527",12))
> as timestamp)  as adddate;
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 4 June 2016 at 21:31, Igor Kravzov <igork.inexso@gmail.com> wrote:
>
>> Thanks Dudu.
>> So if I need actual date I will use view.
>> Regarding partition column:  I can create 2 external tables based on the
>> same data with integer or string column partition and see which one is more
>> convenient for our use.
>>
>> On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu <dmarkovitz@paypal.com>
>> wrote:
>>
>>> I’m not aware of an option to do what you request in the external table
>>> definition but you might want to that using a view.
>>>
>>>
>>>
>>> P.s.
>>>
>>> I seems to me that defining the partition column as a string would be
>>> more user friendly than integer, e.g. –
>>>
>>>
>>>
>>> select * from threads_test where yyyymmdd like ‘2016%’ – year 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘201603%’ –- March 2016;
>>>
>>> select * from threads_test where yyyymmdd like ‘______01’ -- first of
>>> every month;
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> $ hdfs dfs -ls -R /tmp/threads_test
>>>
>>> drwxr-xr-x   - cloudera supergroup          0 2016-06-04 10:45
>>> /tmp/threads_test/20160604
>>>
>>> -rw-r--r--   1 cloudera supergroup        136 2016-06-04 10:45
>>> /tmp/threads_test/20160604/data.txt
>>>
>>>
>>>
>>> $ hdfs dfs -cat /tmp/threads_test/20160604/data.txt
>>>
>>> {"url":"www.blablabla.com
>>> ","pageType":"pg1","addDate":"2016-05-17T02:10:44.527","postDate":"2016-05-16T02:08:55","postText":"YadaYada"}
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> hive> add jar
>>> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar;
>>>
>>>
>>>
>>> hive>
>>>
>>> create external table threads_test
>>>
>>> (
>>>
>>>     url         string
>>>
>>>    ,pagetype    string
>>>
>>>    ,adddate     string
>>>
>>>    ,postdate    string
>>>
>>>    ,posttext    string
>>>
>>> )
>>>
>>>     partitioned by (yyyymmdd string)
>>>
>>>     row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>>     location '/tmp/threads_test'
>>>
>>> ;
>>>
>>>
>>>
>>> hive> alter table threads_test add partition (yyyymmdd=20160604)
>>> location '/tmp/threads_test/20160604';
>>>
>>>
>>>
>>> hive> select * from threads_test;
>>>
>>>
>>>
>>> www.blablabla.com    pg1    2016-05-17T02:10:44.527
>>> 2016-05-16T02:08:55  YadaYada      20160604
>>>
>>>
>>>
>>> hive>
>>>
>>> create view threads_test_v
>>>
>>> as
>>>
>>> select      url
>>>
>>>            ,pagetype
>>>
>>>            ,cast (concat_ws(' ',substr (adddate ,1,10),substr (adddate
>>> ,12)) as timestamp)  as adddate
>>>
>>>            ,cast (concat_ws(' ',substr (postdate,1,10),substr
>>> (postdate,12)) as timestamp)  as postdate
>>>
>>>            ,posttext
>>>
>>>
>>>
>>> from        threads_test
>>>
>>> ;
>>>
>>>
>>>
>>> hive> select * from threads_test_v;
>>>
>>>
>>>
>>> www.blablabla.com    pg1    2016-05-17 02:10:44.527    2016-05-16
>>> 02:08:55  YadaYada
>>>
>>>
>>>
>>>
>>>
>>> *From:* Igor Kravzov [mailto:igork.inexso@gmail.com]
>>> *Sent:* Saturday, June 04, 2016 8:13 PM
>>> *To:* user@hive.apache.org
>>> *Subject:* Convert date in string format to timestamp in table
>>> definition
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have 2 dates in Json file defined like this
>>>
>>> "addDate": "2016-05-17T02:10:44.527",
>>>
>>>   "postDate": "2016-05-16T02:08:55",
>>>
>>>
>>>
>>> Right now I define external table based on this file like this:
>>>
>>> CREATE external TABLE threads_test
>>>
>>> (url string,
>>>
>>>  pagetype string,
>>>
>>>  adddate string,
>>>
>>>  postdate string,
>>>
>>>  posttext string)
>>>
>>> partitioned by (yyyymmdd int)
>>>
>>> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>>>
>>> location 'my location';
>>>
>>>
>>>
>>> is it possible to define these 2 dates as timestamp?
>>>
>>> Do I need to change date format in the file? is it possible to specify
>>> date format in table definition?
>>>
>>> Or I better off with string?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>
>>
>

Mime
View raw message