hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-756) UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path
Date Wed, 08 Apr 2009 14:41:13 GMT
UDFs should have API for transparently opening and reading files from HDFS or from local file
system with only relative path
----------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-756
                 URL: https://issues.apache.org/jira/browse/PIG-756
             Project: Pig
          Issue Type: Bug
            Reporter: David Ciemiewicz


I have a utility function util.INSETFROMFILE() that I pass a file name during initialization.

{code}
define inQuerySet util.INSETFROMFILE(analysis/queries);
A = load 'logs' using PigStorage() as ( date int, query chararray );
B = filter A by inQuerySet(query);
{code}

This provides a computationally inexpensive way to effect map-side joins for small sets plus
functions of this style provide the ability to encapsulate more complex matching rules.

For rapid development and debugging purposes, I want this code to run without modification
on both my local file system when I do pig -exectype local and on HDFS.

Pig needs to provide an API for UDFs which allow them to either:

1) "know"  when they are in local or HDFS mode and let them open and read from files as appropriate
2) just provide a file name and read statements and have pig transparently manage local or
HDFS opens and reads for the UDF

UDFs need to read configuration information off the filesystem and it simplifies the process
if one can just flip the switch of -exectype local.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message