pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)
Date Tue, 04 Sep 2012 18:05:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447870#comment-13447870

Ted Malaska commented on PIG-2494:

Hey Dmitriy,

I know it's been a long time but I'm going to try to finish this Issue # now. 

I just reviewed the SequenceFileLoader code in elephant-bird and it looks like the major piece
to bring over is the idea of the converter and it's ability to transform the raw data and
provide a schema for the outputting format.

This would add a lot of power to the existing implementation.

I'll start on this tonight.
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: SequenceFileLoader.java
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the
value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process
my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable
key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message