Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12611956A for ; Thu, 23 Aug 2012 17:10:44 +0000 (UTC) Received: (qmail 22979 invoked by uid 500); 23 Aug 2012 17:10:43 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 22832 invoked by uid 500); 23 Aug 2012 17:10:43 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 22804 invoked by uid 99); 23 Aug 2012 17:10:43 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 17:10:43 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BC6E72C0A5A for ; Thu, 23 Aug 2012 17:10:42 +0000 (UTC) Date: Fri, 24 Aug 2012 04:10:42 +1100 (NCT) From: "Hudson (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <1962034656.6324.1345741842772.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HADOOP-8655) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440453#comment-13440453 ] Hudson commented on HADOOP-8655: -------------------------------- Integrated in Hadoop-Hdfs-trunk-Commit #2691 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2691/]) HADOOP-8655. Fix TextInputFormat for large deliminators. (Gelesh via bobby) (Revision 1376592) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376592 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java > In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-8655 > URL: https://issues.apache.org/jira/browse/HADOOP-8655 > Project: Hadoop Common > Issue Type: Bug > Components: util > Affects Versions: 0.20.2 > Environment: Linux- Ubuntu 10.04 > Reporter: Arun A K > Labels: hadoop, mapreduce, textinputformat, textinputformat.record.delimiter > Fix For: 3.0.0, 2.2.0-alpha > > Attachments: HADOOP-8655 (2).patch, HADOOP-8655.patch, HADOOP-8655.patch, HADOOP-8655.patch, MAPREDUCE-4519.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Set textinputformat.record.delimiter as "" > Suppose the input is a text file with the following content > 1User12User23User34User45User5 > Mapper was expected to get value as > Value 1 - 1User1 > Value 2 - 2User2 > Value 3 - 3User3 > Value 4 - 4User4 > Value 5 - 5User5 > According to this bug Mapper gets value > Value 1 - entity>1User1 > Value 2 - id>2User2 > Value 3 - 3id>User3 > Value 4 - 4User4name> > Value 5 - 5User5 > The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some random positions in the map input. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira