pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-48) LoadFunc API is too limiting
Date Mon, 26 Jan 2009 19:51:59 GMT

     [ https://issues.apache.org/jira/browse/PIG-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich resolved PIG-48.
-------------------------------

    Resolution: Fixed

This can already be done with custom splits

> LoadFunc API is too limiting
> ----------------------------
>
>                 Key: PIG-48
>                 URL: https://issues.apache.org/jira/browse/PIG-48
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sam Pullara
>            Priority: Minor
>
> Currently the LoadFunc API assumes that you are pulling data from a Hadoop filesystem
and that PIG will have already found the file and split it.  I would like a lower-level API
that hands me the information so I can find the data and do the split.  For instance, this
is a very inconvenient way to load data from an RSS URL:
> register /Users/samp/Projects/pigrss/out/getfeed-all.jar
> define getFeed com.sampullara.pig.storage.GetFeed();
> URL = LOAD 'url' using PigStorage() as (url);
> A = FOREACH URL GENERATE FLATTEN(getFeed(url));
> Where GetFeed is an EvalFunc because there was no way to do this as a LoadFunc.  While
we are at we could add the ability to create a literal Tuple in the PIG language :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message