hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1314) Add DateTime Support to Pig
Date Sun, 30 May 2010 05:57:37 GMT

    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873382#action_12873382
] 

Russell Jurney commented on PIG-1314:
-------------------------------------

Ok, thinking about really doing this soon, after Boolean.  I'd like to add two new primitives
to Pig - DateTime and Duration.  

I'd do this on the wiki, but I don't have edit access.  Can someone please grant the ability
to make a new page to user RussellJurney on the Pig wiki?

Design Notes:

1) I'd like to use Jodatime for this, as I did in the DateTime UDFs.  It is possible to use
the Java date libs, but it would be painful to do so.  Jodatime also performs better than
Java's native date classes.  It is Apache 2.0 licensed and is already pulled in via ivy in
the DateTime UDFs - see PIG-1310

2) Date Format for text/dumps: ISO8601.  Looks like: [YYYY][MM][DD]T[hh][mm]Z  It is a human
readable, sortable/comparable, international standard.  See http://en.wikipedia.org/wiki/ISO_8601#Dates

2.5) In memory type: org.joda.time.DateTime.  See http://joda-time.sourceforge.net/apidocs/org/joda/time/DateTime.html

The internal format of jodatime is a Long epoch/Unix/POSIX time.  See http://joda-time.sourceforge.net/faq.html#internalstorage

3) Duration Format for text/dumps: ISO8601.  Looks like: P[n]Y[n]M[n]DT[n]H[n]M[n]S  It is
a human readable, sortable/comparable, international standard.  See http://en.wikipedia.org/wiki/ISO_8601#Durations

3.5) In-memory format: org.joda.time.Duration.  See http://joda-time.sourceforge.net/apidocs/org/joda/time/Duration.html

4) All date functions in PIG-1310 should be included, except those replaced by the use of
operators on datetimes and durations.  Adding/subtracting datetimes should result in a duration.
 Durations can be added/subtracted/divided/multiplied/negated.  

Date/Duration truncation, date differences, date parsing/conversion should be included.  Conversion
from int/long POSIX, SQL and datemonth should be included.  Conversion from any string with
a DateFormat string should be included.

5) Casting to and from Integer and Long should be supported, as a Unix/POSIX time.  Casting
to/from chararray in ISO8601 format should be supported.

Comments?  Suggestions?

> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component.
 Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  We're looking
at doing this, rather than use UDFs.  Is this a patch that would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message