hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Groves (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-55) Allow user control over split creation
Date Thu, 10 Jan 2008 01:24:33 GMT

     [ https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Charlie Groves updated PIG-55:
------------------------------

    Attachment: replaceable_PigSplit_v2.diff

This version of the patch moves PigSplit and PigSplitFactory into the main pig package, and
greatly reduces the size of the interfaces.  Now neither interface references code from in
the impl packages.

Having now looked at EvalSpec, I realize it's probably not the right thing to expose to allow
access to the fields used from a particular input, so I'm going to leave that part of my use
case out of this patch.

> Allow user control over split creation
> --------------------------------------
>
>                 Key: PIG-55
>                 URL: https://issues.apache.org/jira/browse/PIG-55
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Charlie Groves
>         Attachments: replaceable_PigSplit.diff, replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to access from
pig.  This means I can't use LoadFunc to get at the data as it only allows the loader access
to a single input stream at a time.  To handle this usage, I've broken the existing split
creation code out into a few classes and interfaces, and allowed user specified load functions
to be used in place of the existing code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message