pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4608) FOREACH ... UPDATE
Date Thu, 15 Feb 2018 20:53:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366248#comment-16366248
] 

Koji Noguchi commented on PIG-4608:
-----------------------------------

bq. To me, UPDATE $1 with r+$2 means update the first field, regardless of name, with r+second
field.
You probably meant update the second field with r+third field.  (Pig counts from 0 position.)

In any cases, I get your point.  
[~daijy], [~rohini], [~kpriceyahoo], any preferences? 


bq. UPDATE $1 means n=$1 and updating the _n_th field accordingly.
My type of interpretation for $1 probably should be disallowed anyways since this takes away
the optimization opportunity.  (not knowing which fields getting updated/dropped at compile
time.)


> FOREACH ... UPDATE
> ------------------
>
>                 Key: PIG-4608
>                 URL: https://issues.apache.org/jira/browse/PIG-4608
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Haley Thrapp
>            Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do not match
an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large number of fields
(in the 20-200 range). Often, we need to only make modifications to a few fields. The FOREACH
... UPDATE statement, allows the developer to focus on the actual logical changes instead
of having to list all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe this can
be done with changes to the parser and the creation of a new LOUpdate. No physical plan changes
should be needed because we will leverage what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message