Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 28975 invoked from network); 10 Apr 2007 18:10:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Apr 2007 18:10:02 -0000 Received: (qmail 21178 invoked by uid 500); 10 Apr 2007 18:10:02 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 21105 invoked by uid 500); 10 Apr 2007 18:10:01 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 21038 invoked by uid 99); 10 Apr 2007 18:10:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 11:10:01 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 11:09:53 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id EA0AB71407D for ; Tue, 10 Apr 2007 11:09:32 -0700 (PDT) Message-ID: <28488588.1176228572955.JavaMail.jira@brutus> Date: Tue, 10 Apr 2007 11:09:32 -0700 (PDT) From: "Runping Qi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1204) Re-factor InputFormat/RecordReader related classes In-Reply-To: <28598713.1175716172231.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Runping Qi updated HADOOP-1204: ------------------------------- Status: Open (was: Patch Available) > Re-factor InputFormat/RecordReader related classes > -------------------------------------------------- > > Key: HADOOP-1204 > URL: https://issues.apache.org/jira/browse/HADOOP-1204 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assigned To: Runping Qi > > This Jira is the first small step to unify the code related to the inputformat/record readers for streaming > with the Hadoop main framework. > This Jira does a few things to clean up the related parts in the Hadoop main framework. > 1. Add a constructor > public LineRecordReader(Configuration job, FileSplit split) > to LineRecordReader. This makes the constructors of both SequenceFileRecordReader and LineRecordReader > have the same signature. This facilitates to have a factory class to create various record readers when > we bring in the class readers classes for hadoop streaming to the main framework. > 2. Implementded next() method using the following newly added protected method to LineRecordReader class: > protected long readLine() throws IOException { > return LineRecordReader.readLine(in, buffer); > } > This allows the user to easily overwrite the readLine logic to use different line breaker (e.g. treat '\r' as part of data, not line breaker). > 3. Rename class InputFormatBase to FileInputFormat to better reflect the functionality of the class. > To keep backward compatible, still keep InputFormatBase class, but make it deprecated shallow class simply inheriting FileInputFormat . > 4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.