pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-55) Allow user control over split creation
Date Tue, 15 Jan 2008 16:33:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559104#action_12559104

Benjamin Reed commented on PIG-55:

This looks really good. Let me see if I can get the test cases run against it.

The openDFSFile removal seems unrelated. Correct? Or do you really mean to remove it?

The other nit I have is that the split returns RecordReader<Text, Tuple>. There are
two issues with this:

1) The first field is not used.
2) RecordReader and Text locks in a dependency on Hadoop. It would be nice if the split interface
could still be valid if Pig was on top of another system. Iterator<Tuple> seems more
general and precise. What do you think? (Granted we would have to wrap this general iterator
into a Hadoop RecordRecord for Hadoop, but at least it gives a nice interface to the programmer.)

> Allow user control over split creation
> --------------------------------------
>                 Key: PIG-55
>                 URL: https://issues.apache.org/jira/browse/PIG-55
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Charlie Groves
>         Attachments: replaceable_PigSplit.diff, replaceable_PigSplit_v2.diff
> I have a dataset in HDFS that's stored in a file per column that I'd like to access from
pig.  This means I can't use LoadFunc to get at the data as it only allows the loader access
to a single input stream at a time.  To handle this usage, I've broken the existing split
creation code out into a few classes and interfaces, and allowed user specified load functions
to be used in place of the existing code.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message