hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rekha (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset once the feature is available in Hadoop
Date Wed, 07 Oct 2009 05:42:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rekha updated PIG-994:
----------------------


Thanks Alan.

I am for 'option on store' mostly and definitely if they are exclusive possibilities.

However for arguments sake, a keyword approach can be considered, in addition.

This is because I am hoping append will open doors to be able to easily patch in update feature
on similar lines into pig api, (and hopefully as part of same jira ticket)
My idea of update is a syntax like  "update <DS1> by (join_keys) from <DS2> by
(join_keys) parallel $PARALLEL"
This will update dataset1(DS1) with data from dataset2(DS2) based on key joins.

{code}
update b by (jon_key1, join_key2) from c by (join_key1, join_key2); //this will update the
DS b directly
//or alternatively
//x = update b by (jon_key1, join_key2) from c by (join_key1, join_key2); // making it two-step.
z = foreach b generate $0, $32, $50; // incase you are taking only few cols from main(b),
new (c)
store z into 'bla' append; // appends the o/p data into 'bla' directly.
{code}

The append case, this below construct will be another way of doing it.
{code}
append b, c; // appends directly into b.
z = foreach b generate $0, $32, $50; // incase you are taking only few cols from main(b),
new (c)
store z into 'bla';
{code}


> Provide 'append' keyword to allow appending to diferent dataset once the feature is available
in Hadoop
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-994
>                 URL: https://issues.apache.org/jira/browse/PIG-994
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.4.0
>         Environment: Grid clusters
>            Reporter: Rekha
>            Priority: Minor
>
> Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 as it is
now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message