Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 32341 invoked from network); 30 Nov 2006 15:21:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2006 15:21:46 -0000 Received: (qmail 83071 invoked by uid 500); 30 Nov 2006 15:21:54 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 83052 invoked by uid 500); 30 Nov 2006 15:21:54 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 83043 invoked by uid 99); 30 Nov 2006 15:21:54 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2006 07:21:54 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2006 07:21:45 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3B2377142EC for ; Thu, 30 Nov 2006 07:21:22 -0800 (PST) Message-ID: <17886867.1164900082239.JavaMail.jira@brutus> Date: Thu, 30 Nov 2006 07:21:22 -0800 (PST) From: "Runping Qi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-759) TextInputFormat should allow different treatment on carriage return char '\r' In-Reply-To: <10853195.1164779663091.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-759?page=comments#action_12454665 ] Runping Qi commented on HADOOP-759: ----------------------------------- The case at my hand is a bit different. We have a file consisting of a sequence of records, separated by LF '\n': REC1\nREC2\n... And it is possible that some records may contain '\r'. Thus, it is wrong to interpret '\r' as a line breaker. > TextInputFormat should allow different treatment on carriage return char '\r' > ----------------------------------------------------------------------------- > > Key: HADOOP-759 > URL: http://issues.apache.org/jira/browse/HADOOP-759 > Project: Hadoop > Issue Type: Improvement > Reporter: Runping Qi > > The current implementation treat '\r' and '\n' both as line breakers. However, in some cases, it is desiable to strictly use '\n' as the solely line breaker and treat '\r' as a part of data in a line. > One way to do this is to make readline function as a member function so that the user can create a subclass to overwrite the function with the desired behavior. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira