hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1148) Move splitable logic from pig latin to InputFormat
Date Wed, 16 Dec 2009 15:15:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791396#action_12791396

Jeff Zhang commented on PIG-1148:

Pradeep, I do not quite understand your meaning.
I'd like to explain my idea again.  Now users have to user "split by 'file' " in pig-latin
to force hadoop do not split file into InputSplit. I don't think it's a good idea to put too
many features on pig-latin. The principle of load store redesign is to integrate the features
of hadoop into pig as much as possible but do not want to tie Pig Latin tightly to Hadoop.
So my suggestion is that,  if pig do not want to split a file, he can provide a LoadFunc whose
InputFormat control the splitable, this InputFormat extends FileInputFormat and override method
isSplitable(FileSystem fs, Path filename) to control the splitable.

here's the code snippet illustrating my idea:

public class MyPigStorage extends PigStorage{
    public InputFormat getInputFormat() {
        return new MyInputFormat();

public class MyInputFormat extends TextInputFormat{
   protected boolean isSplitable(FileSystem fs, Path filename) {
    return false;

> Move splitable logic from pig latin to InputFormat
> --------------------------------------------------
>                 Key: PIG-1148
>                 URL: https://issues.apache.org/jira/browse/PIG-1148
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message