Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 23841 invoked from network); 15 Feb 2011 21:41:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Feb 2011 21:41:19 -0000 Received: (qmail 20283 invoked by uid 500); 15 Feb 2011 21:41:19 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 20237 invoked by uid 500); 15 Feb 2011 21:41:19 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 20227 invoked by uid 99); 15 Feb 2011 21:41:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Feb 2011 21:41:19 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Feb 2011 21:41:17 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BD1181A668E for ; Tue, 15 Feb 2011 21:40:57 +0000 (UTC) Date: Tue, 15 Feb 2011 21:40:57 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <323345914.18861.1297806057771.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <30940098.271681294707650812.JavaMail.jira@thor> Subject: [jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995017#comment-12995017 ] Todd Lipcon commented on MAPREDUCE-2254: ---------------------------------------- looks good except one minor nit - can you add the apache license to the new test file? > Allow setting of end-of-record delimiter for TextInputFormat > ------------------------------------------------------------ > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Ahmed Radwan > Attachments: MAPREDUCE-2245.patch, MAPREDUCE-2254_r2.patch > > > It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n'). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira