hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tucker, Matt" <>
Subject RE: order by date
Date Tue, 13 Mar 2012 18:56:50 GMT
If you don't want to modify your CSV files, I would suggest doing the conversion as part of
the query.  For that, you can either include the conversion in each query, or you can create
a view of your table that includes a column with the converted date.

Either way, you may want to try from_unixtime(unix_timestamp(dateColName, 'M/d/yyyy h:mm')),
which should convert it into a format that you can sort by.

Matt Tucker

-----Original Message-----
From: Keith Wiley [] 
Sent: Tuesday, March 13, 2012 1:47 PM
Subject: Re: order by date

I see how I misled you, sorry.  I wasn't implying that my csv data is cleanly represented
in yyyy-MM-dd format.  I was just asking syntactically how to use date functions in HiveQL
because I hadn't found any examples and I used yyyy-MM-dd in my example.  The dates in my
csv tables are often in "American" format, month first without leading zeroes, e.g., "3/31/2012
7:00".  The lack of leading zeroes and the unsortabled date format make the dates difficult
to work with.  I was thinking I could use the date functions with some other format to sort
them (I guess it would be "M/d/yyyy h:mm" or something like that).

I admit, I didn't actually go to the trouble of providing the correct pattern string in my
earlier post, I was focused on the HiveQL syntax in that post, not the precise date pattern
given to the date function.

So yeah, I'm still trying to determine the best way to sort queries against the date-time
columns.  One option is to read/write the entire tables with a date conversion to a lexicographic
format.  Another option -- my original question in this thread -- was how I might use hive's
date functions at the time a query is performed.

What do you think is the best way to deal with this?


On Mar 13, 2012, at 10:35 , Tucker, Matt wrote:

> I'm a bit confused.  It sounds like you're already storing your dates as strings in a
'yyyy-mm-dd' format.  In that case, you can just sort by dateColName.  There's no issue with
using UNIX_TIMESTAMP() in the order by clause, as it outputs integer values.
> Most of the date functions in hive take arguments in string format, with a few functions
that will translate between unix timestamps and datetime strings.
> Matt Tucker

Keith Wiley

"I do not feel obliged to believe that the same God who has endowed us with sense, reason,
and intellect has intended us to forgo their use."
                                           --  Galileo Galilei ________________________________________________________________________________

View raw message