pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2492) AvroStorage should recognize globs and commas
Date Fri, 23 Mar 2012 16:19:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236730#comment-13236730

Scott Carey commented on PIG-2492:

Something seems way off here.

I have a custom LoadFunc for Avro (very different feature set, built before AvroStorage).

It has worked with globs since the beginning, with only this:

  public void setLocation(String location, Job job) throws IOException {
    FileInputFormat.setInputPaths(job, location);

This is much, much simpler.

This also solves the "only works with *.avro" file issue.  But it changes the syntax you would
need in the LOAD statement.

In my scripts, If I might do something like

A = LOAD '/events/2012/03/23/{views,clicks}/*.avro' using MyCustomStorageFunc();

In other words, use the glob and the well tested FileInputFormat to find files, don't write
it in your LoadFunc.

> AvroStorage should recognize globs and commas
> ---------------------------------------------
>                 Key: PIG-2492
>                 URL: https://issues.apache.org/jira/browse/PIG-2492
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>            Reporter: Stan Rosenberg
>         Attachments: AvroStorage.patch, AvroStorageUtils.patch
> I've patched AvroStorage and AvroStorageUtils to support the same file input syntax as
currently supported
> by hadoop's FileInputFormat.  Specifically, globs and commas are supported.
> Somebody should write some unit tests for theses changes; I am currently pressed for

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message