pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3257) Add unique identifier UDF
Date Tue, 28 May 2013 22:34:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668748#comment-13668748
] 

Alan Gates commented on PIG-3257:
---------------------------------

I don't see how records can be missing or redundant.  Take the following query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code]

This won't reduce at all.  For every record it is totally irrelevant what particular value
its key is, because it's guaranteed to be unique for each record.  So 1) this is a totally
meaningless thing to do; 2) if a particular map does get rerun or is used in speculative execution
it doesn't matter because which particular key is generated by UUID is irrelevant.  The way
this intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}

                
> Add unique identifier UDF
> -------------------------
>
>                 Key: PIG-3257
>                 URL: https://issues.apache.org/jira/browse/PIG-3257
>             Project: Pig
>          Issue Type: Improvement
>          Components: internal-udfs
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: 0.12
>
>         Attachments: PIG-3257.patch
>
>
> It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message