hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1314) Add DateTime Support to Pig
Date Mon, 22 Mar 2010 19:03:27 GMT

    [ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848285#action_12848285

Alan Gates commented on PIG-1314:

Major +1.  Adding DateTime as a Pig primitive is definitely a good idea.  It's on our list
of things to do (http://wiki.apache.org/pig/PigJournal).  A brief overview of the work to
be done:

# Add support in parser, both for declaring an input to be of type datetime and datetime constants
# Add support in TypeChecker for datetime types, including any allowed type promotions (ie
implicit casts)
# Change LoadCaster interface to include bytesToDateTime method, add method to default implementation
# Determerine which builtin UDFs that we want for datetime and get agreement from community.
 Implement these UDFs.
# Implement any allowed cast operators for datetime (probably just string <-> datetime).
# Implement datetime class represents datetime in memory.  This needs to implement WritableComparable
so that it can be serialized and compared in Hadoop
# Implement raw comparator for the type so it can be used as a key in groups bys and joins.
# Change physical operators and builtin UDFs to handle processing of datetime types.
# Change data conversion and type discovery routines in DataType
# And, of course, add prolific tests

The other question is backward compatibility.  I can think of only two backward incompatible
# Addition of bytesToDateTime in the LoadCaster interface.  Given that this will only require
a change if people recompile their implementation, and AFAIK there are no implementations
of LoadCaster before our default implementation, I think this is ok.
# Changes to Pig Latin to specify a field as of type date, plus however we denote datetime
strings.  We need to make these as unobtrusive as possible, but again I think it will be ok,
though we'll need to get community buy in on it.

Would such a patch be accepted?  If it's of good quality deals with backward compatibility
concerns, certainly.  In time for 0.8, I don't know.  We try to do a release every three months,
with a feature cut off about a month before release (give or take).  Branching and feature
cutoff for 0.7 is today, so branching and features cut off for 0.8 will probably be in June.

If you want to pursue this, the first step should be a brief design that says how you'll go
about doing it.  It should cover things like which date format will you use (SQL, something
else)?  Which date function do you think should be built in?  How to you plan to store this
type in memory?  Are there existing datetime libraries you can leverage or incorporate to
avoid rebuilding the wheel?  It's easiest to write up the design on Pig's wiki and then link
to it on this bug.  This will give users and developers a chance to review your thoughts and
give feedback.

> Add DateTime Support to Pig
> ---------------------------
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>             Fix For: 0.8.0
>   Original Estimate: 672h
>  Remaining Estimate: 672h
> Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component.
 Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  We're looking
at doing this, rather than use UDFs.  Is this a patch that would be accepted?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message