From dev-return-30631-apmail-pig-dev-archive=pig.apache.org@pig.apache.org Tue Sep 4 21:38:09 2012 Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F3F7DA55 for ; Tue, 4 Sep 2012 21:38:09 +0000 (UTC) Received: (qmail 55719 invoked by uid 500); 4 Sep 2012 21:38:08 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 55653 invoked by uid 500); 4 Sep 2012 21:38:08 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 55551 invoked by uid 500); 4 Sep 2012 21:38:08 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 55538 invoked by uid 99); 4 Sep 2012 21:38:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Sep 2012 21:38:08 +0000 Date: Wed, 5 Sep 2012 08:38:08 +1100 (NCT) From: "Ted Malaska (JIRA)" To: pig-dev@hadoop.apache.org Message-ID: <200447551.35080.1346794688558.JavaMail.jiratomcat@arcas> In-Reply-To: <1235498892.5624.1327843870131.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448084#comment-13448084 ] Ted Malaska commented on PIG-2494: ---------------------------------- So I have four options on how I should address this issue #. 1. Update Sequence Loader so that it will be able to handle nullWritable keys and also handle delimiters like PigStorage. 2. All of option (1) plus update sequence loader to sequence storage so we can use it to dump out data in sequence files. 3. Bring the elephant-bird implementation over to piggybank and add support for delimiters. 4. Drop the whole delimiter thing because we can use TOKENIZE Let me know. > Improvement to SequenceFileLoader (NullWritable and Delimiter) > -------------------------------------------------------------- > > Key: PIG-2494 > URL: https://issues.apache.org/jira/browse/PIG-2494 > Project: Pig > Issue Type: Improvement > Components: piggybank > Affects Versions: 0.9.1 > Environment: All > Reporter: Ted Malaska > Priority: Minor > Labels: newbie, simple > Attachments: SequenceFileLoader.java > > > I wanted to add two features to SequenceFileLoader. > 1. I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray). > 2. I added the option of the key being a NullWritable. I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader. > My change is attached to this Issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira