Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0AC0797FF for ; Fri, 21 Oct 2011 23:42:56 +0000 (UTC) Received: (qmail 14508 invoked by uid 500); 21 Oct 2011 23:42:55 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 14477 invoked by uid 500); 21 Oct 2011 23:42:55 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 14469 invoked by uid 99); 21 Oct 2011 23:42:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 23:42:55 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 23:42:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C16FF3148C8 for ; Fri, 21 Oct 2011 23:40:32 +0000 (UTC) Date: Fri, 21 Oct 2011 23:40:32 +0000 (UTC) From: "Owen O'Malley (Resolved) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <414292175.4015.1319240432793.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1209004013.14750.1319103250988.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (HADOOP-7760) BytesWritable / SequenceFile yields dummy linefeed at end as soon as content has one or more linefeeds. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-7760. ----------------------------------- Resolution: Not A Problem The "problem" is that BytesWritable and Text reuse the same byte array, for shorter data. That prevents reallocation at the cost of needing to check the length of the data. > BytesWritable / SequenceFile yields dummy linefeed at end as soon as content has one or more linefeeds. > ------------------------------------------------------------------------------------------------------- > > Key: HADOOP-7760 > URL: https://issues.apache.org/jira/browse/HADOOP-7760 > Project: Hadoop Common > Issue Type: Bug > Components: record > Affects Versions: 0.20.2 > Environment: Easily reproducable on Debian Linux cluster but also on my Arch Linux desktop. > I am aware there are some newer releases in the 0.20 series, but all changelogs and release note links for those @ http://hadoop.apache.org/common/releases.html are broken, so I can't check if this has been fixed and/or whether it's safe to upgrade. > Reporter: Dieter Plaetinck > Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > I create SequenceFiles which have BytesWritable as values. > I notice that if I store content which contains no linefeeds ("\n") or one linefeed, in the value, the value can also be read out of the sequencefile properly. > However, as soon as I store input which contains two or more linefeeds (which is actually pretty much always the case), during the process of writing to the sequencefile and reading my data back, one *extra* linefeed is yielded at the end of the value, a linefeed which did not exist in the input. > So this effectively corrupts my data, although i could write a hacky workaround for it. > I have written a program that demonstrates the behavior, by showing what happens when writing 2 sequencefiles: > one that has a record which value contains one linefeeds. > another that has a record which value contains two linefeeds. > Upon reading, the latter value will contain 3 linefeeds. > Test file is : http://pastie.org/2728797 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira