Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 75712 invoked from network); 13 Jun 2007 07:06:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Jun 2007 07:06:16 -0000 Received: (qmail 30120 invoked by uid 500); 13 Jun 2007 07:06:18 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 30095 invoked by uid 500); 13 Jun 2007 07:06:18 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 30085 invoked by uid 99); 13 Jun 2007 07:06:18 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2007 00:06:18 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [207.126.228.149] (HELO rsmtp1.corp.yahoo.com) (207.126.228.149) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2007 00:06:14 -0700 Received: from [0.0.0.0] (proxy7.corp.yahoo.com [216.145.48.98]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l5D75psg037220 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 13 Jun 2007 00:05:52 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=DeuHqMrMhUWKhx5F6YZLDGM8Kg+ipBg6ORsGncVtAZ4oyn6Ca83mpQKaucChWYDY Message-ID: <466F974E.90403@yahoo-inc.com> Date: Wed, 13 Jun 2007 00:05:50 -0700 From: Raghu Angadi User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: \r\n problem in LineRecordReader.java References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Bwolen Yang wrote: > Here is probably the cause of this bug: > > public int read(byte b[], int off, int len) throws IOException { > // make sure that it ends at a checksum boundary > long curPos = getPos(); > long endPos = len+curPos/bytesPerSum*bytesPerSum; > return readBuffer(b, off, (int)(endPos-curPos)); > } > > Here, the caller calls the function with 127 bytes, and bytesPerSum is 256. Is this from looking at the code or you actually saw the values like this at runtime? I think 'len' is never supposed to be less than bytesPerChecksum because there is a BufferedInputStream between with a buffer size of io.buffer.size (default 4096). So this buffer size is supposed to be larger than bytesPerChecksum (requirement changes with HADOOP-1450 where it uses buffer size equal to bytesPerChecksum). Raghu.