pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-169) Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)
Date Wed, 02 Apr 2008 06:28:24 GMT

     [ https://issues.apache.org/jira/browse/PIG-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Arun C Murthy resolved PIG-169.

    Resolution: Won't Fix

Currently there isn't infrastructure to follow a given alias up the logical tree and check
if it is a result of a GROUP and further check if hasn't been flattened etc., so marking this
as *won't fix*.

> Enhance PigStorage to handle complicated Tuples (i.e. automatically flatten them)
> ---------------------------------------------------------------------------------
>                 Key: PIG-169
>                 URL: https://issues.apache.org/jira/browse/PIG-169
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
> Currently PigStorage (actually Tuple.toDelimitedString) only handles the simple case
of straight DataAtoms as fields and borks if it has any other Datum as a field. It would be
nice to enhance it to handle the more complicated cases too. Currently users _have to_ use
a *flatten* to convert these to simpler Tuples which can be then handled by PigStorage.
> ----
> On a related note, there is an interesting caveat with GROUP/COGROUP operators... they
result in tuples with the first field which has the name 'group', whose value on which the
grouping has been performed. 
> E.g.
> Input:
>  <A, 1>
>  <A, 2>
> Pig script:
>  INPUT = load 'input';
>  A = group INPUT by $0;
>  B = stream A through `script`;
> Results in A being: 
> (A, {(A, 1), (A, 2)})
> Now, if PigStorage _auto-flattens_ A it results in:
>  (A, A, 1)
>  (A, A, 2)
> However, user expectation is probably the straight-forward:
>  (A, 1)
>  (A, 2)
> ---
> Alan suggested that we could use the LOVisitor infrastructure to visit nodes in the tree,
save up information (i.e. that a GROUP/COGROUP occured) and then use that information to get
PigStorage to 'skip' the group field while auto-flattening. However it might have to done
if, and only if, PigStorage is auto-flattening tuples directly coming from a GROUP/COGROUP
operator i.e. doesn't have other EvalSpecs working on those tuples ...
> ---
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message