Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 37475 invoked from network); 1 Apr 2010 23:37:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Apr 2010 23:37:42 -0000 Received: (qmail 86487 invoked by uid 500); 1 Apr 2010 23:37:42 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 86464 invoked by uid 500); 1 Apr 2010 23:37:42 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 86456 invoked by uid 99); 1 Apr 2010 23:37:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 23:37:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hiveuser@gmail.com designates 209.85.222.200 as permitted sender) Received: from [209.85.222.200] (HELO mail-pz0-f200.google.com) (209.85.222.200) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Apr 2010 23:37:34 +0000 Received: by pzk38 with SMTP id 38so574794pzk.25 for ; Thu, 01 Apr 2010 16:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=wNFRpLMYhECco3NsdLJvHEFLe9bZJOLn1Wg/LER8+nQ=; b=j07g6hlCTMQa7oTXiUy64lBjlLCZSHVaW2Y8BnjUPCNt5r2xmL0l6b7lwaf5+KM3/V p2tmLHYv4CwSZwfiW2lWCvC+WNm6MyvtDx9phuGtitS+ueZviHDodJ/hyG+TWh20DgoH 6QyhyMsQvrV2RFHikmFmEs0DT3DJNSXLRkDFg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=r2BWYtfuh8iBDE/HdM0Yhtdu/sCCkEEHAxBy99Ygi2S5F5Xoaxvsyt8Bai6akO/v4X TfHhoKqeUdFJcaJ8auOBebMHzMZuJTz6J12A6ICRvWF0prMMaKAiJPlWfk4w7C+TWJlS ouCRouxRXwF/jFvjS97aCWdyWgSqz8EOEw79g= MIME-Version: 1.0 Received: by 10.142.139.11 with HTTP; Thu, 1 Apr 2010 16:37:12 -0700 (PDT) In-Reply-To: References: Date: Thu, 1 Apr 2010 16:37:12 -0700 Received: by 10.143.21.30 with SMTP id y30mr608231wfi.127.1270165032974; Thu, 01 Apr 2010 16:37:12 -0700 (PDT) Message-ID: Subject: Re: unix_timestamp function From: tom kersnick To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502ce0398baaf0483355792 X-Virus-Checked: Checked by ClamAV on apache.org --00504502ce0398baaf0483355792 Content-Type: text/plain; charset=ISO-8859-1 Ok thanks! I will try it out..... /tom On Thu, Apr 1, 2010 at 4:31 PM, Zheng Shao wrote: > Setting TZ in your .bash_profile won't work because the map/reduce tasks > runs on the hadoop clusters. > If you start your hadoop tasktracker with that TZ setting, it will probably > work. > > Zheng > > On Thu, Apr 1, 2010 at 3:32 PM, tom kersnick wrote: > >> So its working, but Im having a time zone issue. >> >> My servers are located in EST, but i need this data in PST. >> >> So when it converts this: >> >> hive> select from_unixtime(1270145333,'yyyy-MM-dd HH:mm:ss') from >> ut2; >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks is set to 0 since there's no reduce operator >> Starting Job = job_201003031204_0102, Tracking URL = >> http://master:50030/jobdetails.jsp?jobid=job_201003031204_0102 >> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job >> -Dmapred.job.tracker=master:54311 -kill job_201003031204_0102 >> 2010-04-01 18:28:23,041 Stage-1 map = 0%, reduce = 0% >> 2010-04-01 18:28:37,315 Stage-1 map = 67%, reduce = 0% >> 2010-04-01 18:28:43,386 Stage-1 map = 100%, reduce = 0% >> 2010-04-01 18:28:46,412 Stage-1 map = 100%, reduce = 100% >> Ended Job = job_201003031204_0102 >> OK >> 2010-04-01 14:08:53 >> Time taken: 30.191 seconds >> >> >> I need it to be : >> 2010-04-01 11:08:53 >> >> >> I tried setting the variable in my .bash_profile for TZ=/ /Americas/ = >> no go. >> >> Nothing in the hive ddl link you is leading me in the right direction. Is >> there something you guys can recommend? I can write a script outside of >> hive, but it would be great if I can have users handle this within their >> queries. >> >> Thanks in advance! >> >> /tom >> >> >> >> >> On Thu, Apr 1, 2010 at 2:17 PM, tom kersnick wrote: >> >>> ok thanks.... >>> >>> I should have caught that. >>> >>> /tom >>> >>> >>> >>> >>> On Thu, Apr 1, 2010 at 2:13 PM, Carl Steinbach wrote: >>> >>>> Hi Tom, >>>> >>>> "Unix Time" is defined as the number of *seconds* since January 1, 1970. >>>> It looks like the data you have in cola is in milliseconds. You need to >>>> divide this value by 1000 before calling from_unixtime() on the result. >>>> >>>> Thanks. >>>> >>>> Carl >>>> >>>> >>>> On Thu, Apr 1, 2010 at 2:02 PM, tom kersnick wrote: >>>> >>>>> Thanks, but there is something fishy going on. >>>>> >>>>> Im using hive 0.5.0 with hadoop 0.20.1 >>>>> >>>>> I tried the column as both a bigint and a string. According the hive >>>>> ddl: >>>>> >>>>> string >>>>> >>>>> from_unixtime(int unixtime) >>>>> >>>>> Converts the number of seconds from unix epoch (1970-01-01 00:00:00 >>>>> UTC) to a string representing the timestamp of that moment in the current >>>>> system time zone in the format of "1970-01-01 00:00:00" >>>>> >>>>> It looks like the input is int, that would be too small for my >>>>> 1270145333155 timestamp. >>>>> >>>>> Any ideas? >>>>> >>>>> Example below: >>>>> >>>>> /tom >>>>> >>>>> >>>>> hive> describe ut; >>>>> OK >>>>> cola bigint >>>>> colb string >>>>> Time taken: 0.101 seconds >>>>> >>>>> >>>>> hive> select * from ut; >>>>> OK >>>>> 1270145333155 tuesday >>>>> Time taken: 0.065 seconds >>>>> >>>>> hive> select from_unixtime(cola,'yyyy-MM-dd HH:mm:ss'),colb from ut; >>>>> >>>>> Total MapReduce jobs = 1 >>>>> Launching Job 1 out of 1 >>>>> Number of reduce tasks is set to 0 since there's no reduce operator >>>>> Starting Job = job_201003031204_0083, Tracking URL = >>>>> http://master:50030/jobdetails.jsp?jobid=job_201003031204_0083 >>>>> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job >>>>> -Dmapred.job.tracker=master:54311 -kill job_201003031204_0083 >>>>> 2010-04-01 16:57:32,407 Stage-1 map = 0%, reduce = 0% >>>>> 2010-04-01 16:57:45,577 Stage-1 map = 100%, reduce = 0% >>>>> 2010-04-01 16:57:48,605 Stage-1 map = 100%, reduce = 100% >>>>> Ended Job = job_201003031204_0083 >>>>> OK >>>>> 42219-04-22 00:05:55 tuesday >>>>> Time taken: 18.066 seconds >>>>> >>>>> >>>>> hive> describe ut; >>>>> OK >>>>> cola string >>>>> colb string >>>>> Time taken: 0.077 seconds >>>>> >>>>> >>>>> hive> select * from ut; >>>>> OK >>>>> 1270145333155 tuesday >>>>> Time taken: 0.065 seconds >>>>> >>>>> >>>>> hive> select from_unixtime(cola,'yyyy-MM-dd HH:mm:ss'),colb from ut; >>>>> FAILED: Error in semantic analysis: line 1:7 Function Argument Type >>>>> Mismatch from_unixtime: Looking for UDF "from_unixtime" with parameters >>>>> [class org.apache.hadoop.io.Text, class org.apache.hadoop.io.Text] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Apr 1, 2010 at 1:37 PM, Carl Steinbach wrote: >>>>> >>>>>> Hi Tom, >>>>>> >>>>>> I think you want to use the from_unixtime UDF: >>>>>> >>>>>> hive> describe function extended from_unixtime; >>>>>> describe function extended from_unixtime; >>>>>> OK >>>>>> from_unixtime(unix_time, format) - returns unix_time in the specified >>>>>> format >>>>>> Example: >>>>>> > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1; >>>>>> '1970-01-01 00:00:00' >>>>>> Time taken: 0.647 seconds >>>>>> hive> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Carl >>>>>> >>>>>> On Thu, Apr 1, 2010 at 1:11 PM, tom kersnick wrote: >>>>>> >>>>>>> hive> describe ut; >>>>>>> OK >>>>>>> time bigint >>>>>>> day string >>>>>>> Time taken: 0.128 seconds >>>>>>> hive> select * from ut; >>>>>>> OK >>>>>>> 1270145333155 tuesday >>>>>>> Time taken: 0.085 seconds >>>>>>> >>>>>>> When I run this simple query, I'm getting a NULL for the time column >>>>>>> with data type bigint. >>>>>>> >>>>>>> hive> select unix_timestamp(time),day from ut; >>>>>>> >>>>>>> Total MapReduce jobs = 1 >>>>>>> Launching Job 1 out of 1 >>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator >>>>>>> Starting Job = job_201003031204_0080, Tracking URL = >>>>>>> http://master:50030/jobdetails.jsp?jobid=job_201003031204_0080 >>>>>>> Kill Command = /usr/local/hadoop/bin/../bin/hadoop job >>>>>>> -Dmapred.job.tracker=master:54311 -kill job_201003031204_0080 >>>>>>> 2010-04-01 16:03:54,024 Stage-1 map = 0%, reduce = 0% >>>>>>> 2010-04-01 16:04:06,128 Stage-1 map = 100%, reduce = 0% >>>>>>> 2010-04-01 16:04:09,150 Stage-1 map = 100%, reduce = 100% >>>>>>> Ended Job = job_201003031204_0080 >>>>>>> OK >>>>>>> NULL tuesday >>>>>>> Time taken: 16.981 seconds >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> /tom >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > Yours, > Zheng > --00504502ce0398baaf0483355792 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ok thanks!

I will try it out.....

/tom



On Thu, Apr 1, 2010 at 4:31 PM, Zheng Shao <zshao9@gmail.com> wrote:
Setting T= Z in your .bash_profile won't work because the map/reduce tasks runs on= the hadoop clusters.
If you start your hadoop tasktracker with that TZ setting, it will pr= obably work.

Zheng

On Thu, Apr 1, 2010 at 3:32 PM, tom kersnick <hiveuser@gmail.com&= gt; wrote:
So its working, but Im having a time zone issue.

My servers are loca= ted in EST, but i need this data in PST.

So when it converts this:
hive> select from_unixtime(1270145333,'yyyy-MM-dd HH:mm:ss'= ;) from ut2;=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0
Total MapReduce jobs =3D 1
Launching Job 1 out of 1
Number of reduce = tasks is set to 0 since there's no reduce operator
Starting Jo= b =3D job_201003031204_0102, Tracking URL =3D http://ma= ster:50030/jobdetails.jsp?jobid=3Djob_201003031204_0102
Kill Command =3D /usr/local/hadoop/bin/../bin/hadoop job=A0 -Dmapred.job.tr= acker=3Dmaster:54311 -kill job_201003031204_0102
2010-04-01 18:28:23,041= Stage-1 map =3D 0%,=A0 reduce =3D 0%
2010-04-01 18:28:37,315 Stage-1 ma= p =3D 67%,=A0 reduce =3D 0%
2010-04-01 18:28:43,386 Stage-1 map =3D 100%,=A0 reduce =3D 0%
2010-04-0= 1 18:28:46,412 Stage-1 map =3D 100%,=A0 reduce =3D 100%
Ended Job =3D jo= b_201003031204_0102
OK
2010-04-01 14:08:53
Time taken: 30.191 seco= nds


I need it to be :
2010-04-01 11:08:53


I tried setting the= variable in my .bash_profile=A0 for TZ=3D/ /Americas/=A0 =3D no go.
Nothing in the hive ddl link you is leading me in the right direction.=A0 = Is there something you guys can recommend?=A0 I can write a script outside = of hive, but it would be great if I can have users handle this within their= queries.

Thanks in advance!

/tom
=




On Thu, Apr 1, 2010 a= t 2:17 PM, tom kersnick <hiveuser@gmail.com> wrote:
ok thanks....

I should have caught that.
=
/tom




On Thu, Apr 1, 2010 at 2:13 PM, Carl Steinbach <carl@cloudera.com&= gt; wrote:
Hi Tom,

<= /div>
"Unix Time" is defined as the number of *seconds* since= January 1, 1970. It looks like the data you have in cola is in millisecond= s. You need to divide this value by 1000 before calling from_unixtime() on = the result.

Thanks.

Carl

On Thu, Apr 1, 2010 at 2:02 PM, tom kersnick <hiveuser@gmail.com>= wrote:
Thanks, but there= is something fishy going on.

Im using hive 0.5.0 with hadoop 0.20.1=

I tried the column as both a bigint and a string.=A0 According the hive= ddl:

string

from_unixtime(int unixtime)

Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"


It looks like the input is int,=A0 that would be too smal= l for my 1270145333155 timestamp.

Any ideas?

Example below:
/tom


hive> describe ut;
OK
cola=A0=A0=A0 bigint=A0= =A0=A0
colb=A0=A0=A0 string=A0=A0=A0
Time taken: 0.101 seconds
=

hive> select * from ut;
OK
1270145333155=A0=A0=A0 tuesday<= br>
Time taken: 0.065 seconds

hive> select from_unixtime(cola,'yyyy-MM-dd HH:mm:ss'),colb = from ut;

Total MapReduce jobs =3D 1
Launching Job 1 out of 1
= Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job =3D job_201003031204_0083, Tracking URL =3D http://master:50030/jobdetails.jsp?jobid=3Djob_201003031204_0083
Kill Command =3D /usr/local/hadoop/bin/../bin/hadoop job=A0 -Dmapred.job.tr= acker=3Dmaster:54311 -kill job_201003031204_0083
2010-04-01 16:57:32,407= Stage-1 map =3D 0%,=A0 reduce =3D 0%
2010-04-01 16:57:45,577 Stage-1 ma= p =3D 100%,=A0 reduce =3D 0%
2010-04-01 16:57:48,605 Stage-1 map =3D 100%,=A0 reduce =3D 100%
Ended J= ob =3D job_201003031204_0083
OK
42219-04-22 00:05:55=A0=A0=A0 tuesday=
Time taken: 18.066 seconds


hive> describe ut;=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0
OK
cola=A0=A0=A0 string=A0=A0=A0
colb=A0=A0=A0 string=A0=A0=A0
T= ime taken: 0.077 seconds


hive> select * from ut;
OK
12= 70145333155=A0=A0=A0 tuesday
Time taken: 0.065 seconds


= hive> select from_unixtime(cola,'yyyy-MM-dd HH:mm:ss'),colb from= ut;=A0
FAILED: Error in semantic analysis: line 1:7 Function Argument Type Mismatc= h from_unixtime: Looking for UDF "from_unixtime" with parameters = [class org.apache.hadoop.io.Text, class org.apache.hadoop.io.Text]





On Thu, Apr 1, 2010 at 1:37 PM, Carl= Steinbach <carl@cloudera.com> wrote:
Hi Tom,

I think you want to use the from_unixtime UDF:

hive> describe function extended from_unixt= ime;
describe function extended from_unixtime;
OK
from_unixtime(unix_time, format) - returns unix_time in the specified = format
Example:
=A0=A0> SELECT from_unixtime(0, '= ;yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1;
=A0=A0'1970-01-0= 1 00:00:00'
Time taken: 0.647 seconds
hive>=A0

Thanks.

Carl

On Thu, Apr 1, 2010 a= t 1:11 PM, tom kersnick <hiveuser@gmail.com> wrote:
hive> describe= ut;
OK
time=A0=A0=A0 bigint=A0=A0=A0
day=A0=A0=A0 string=A0=A0= =A0
Time taken: 0.128 seconds
hive> select * from ut;
OK
1270145333155=A0=A0=A0 tuesday
Time taken: 0.085 seconds

Wh= en I run this simple query, I'm getting a NULL for the time column with= data type bigint.

hive> select unix_timestamp(time),day from ut;=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0
Total MapReduce jobs = =3D 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 sin= ce there's no reduce operator
Starting Job =3D job_201003031204_0080= , Tracking URL =3D http://master:50030/jobdetails.jsp?j= obid=3Djob_201003031204_0080
Kill Command =3D /usr/local/hadoop/bin/../bin/hadoop job=A0 -Dmapred.job.tr= acker=3Dmaster:54311 -kill job_201003031204_0080
2010-04-01 16:03:54,024= Stage-1 map =3D 0%,=A0 reduce =3D 0%
2010-04-01 16:04:06,128 Stage-1 ma= p =3D 100%,=A0 reduce =3D 0%
2010-04-01 16:04:09,150 Stage-1 map =3D 100%,=A0 reduce =3D 100%
Ended J= ob =3D job_201003031204_0080
OK
NULL=A0=A0=A0 tuesday
Time taken: = 16.981 seconds

Any ideas?

Thanks!
=
/tom








-- Yours,
Zheng

--00504502ce0398baaf0483355792--