pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3223) AvroStorage does not handle comma separated input paths
Date Fri, 15 Mar 2013 01:37:11 GMT

    [ https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603011#comment-13603011

Cheolsoo Park commented on PIG-3223:

[~mkramer], thank you for the clarification.

You're right:
# PigStorage supports commma-separated input paths.
# The fully qualified paths such as \{hdfs://namenode:8020/testdir1/,hdfs://namenode:8020/testdir2\}
don't work in AvroStorage.

I am not against adding the support for comma-separated list like you suggest.

That said, I don't understand your use case:
The driving force for us comes from how Oozie constructs input paths. If comma separated paths
aren't supported in this way, AvroStorage as is can't be used with Oozie.
Can you please provide your Oozie workflow and Pig script? Why do input paths need to be fully
qualified paths?
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>                 Key: PIG-3223
>                 URL: https://issues.apache.org/jira/browse/PIG-3223
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11
>            Reporter: Michael Kramer
>            Assignee: Johnny Zhang
>         Attachments: AvroStorage.patch, AvroStorage.patch-2, AvroStorageUtils.patch,
AvroStorageUtils.patch-2, PIG-3223.patch.txt
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma separated input
paths (PIG-2492).  While this function works fine for glob-formatted input paths, it fails
when issued a standard comma separated list of paths.  fs.globStatus does not seem to be able
to parse out such a list, and a java.net.URISyntaxException is thrown when toURI is called
on the path.  
> I have a working fix for this, but it's extremely ugly (basically checking if the string
of input paths is globbed, otherwise splitting on ",").  I'm sure there's a more elegant solution.
 I'd be happy to post the relevant methods and "fixes" if necessary.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message