Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 45082 invoked from network); 14 Nov 2009 01:44:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Nov 2009 01:44:03 -0000 Received: (qmail 84547 invoked by uid 500); 14 Nov 2009 01:44:03 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 84479 invoked by uid 500); 14 Nov 2009 01:44:03 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 84469 invoked by uid 99); 14 Nov 2009 01:44:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Nov 2009 01:44:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Nov 2009 01:44:00 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D0899234C1EF for ; Fri, 13 Nov 2009 17:43:39 -0800 (PST) Message-ID: <294793381.1258163019840.JavaMail.jira@brutus> Date: Sat, 14 Nov 2009 01:43:39 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader In-Reply-To: <2134133201.1257100799409.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777798#action_12777798 ] Todd Lipcon commented on MAPREDUCE-1176: ---------------------------------------- Hi, - Please *only* upload the patch to Hudson. Otherwise the QA bot gets confused and tries to apply your .java files as a patch. - Also, the coding style guidelines for Hadoop are have an indentation level of 2 spaces. It looks like your patch is full of tabs. There are a few other style violations. The coding style is http://java.sun.com/docs/codeconv/ with the change of 2 spaces instead of 4. It's probably easier to look through other parts of the Hadoop codebase and simply follow their example. - There's a comment referring to the 0.20.1 code. Since this patch is slated for trunk, not 0.20.1, please remove that. - There are some other bits of commented-out code. These are a no-no - either the code works and is important, in which case it should be there, or it's not important (or broken) and it shouldn't. Thanks again for contributing to Hadoop! The review process can take a while but it's important to maintain style consistency across the codebase. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > ---------------------------------------------------------------- > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Affects Versions: 0.20.1, 0.20.2 > Environment: Any > Reporter: BitsOfInfo > Priority: Minor > Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.