Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 43334 invoked from network); 14 Sep 2006 23:06:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Sep 2006 23:06:39 -0000 Received: (qmail 36851 invoked by uid 500); 14 Sep 2006 23:06:38 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 36833 invoked by uid 500); 14 Sep 2006 23:06:38 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 36824 invoked by uid 99); 14 Sep 2006 23:06:38 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Sep 2006 16:06:38 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 64056714353 for ; Thu, 14 Sep 2006 23:02:23 +0000 (GMT) Message-ID: <6864072.1158274943407.JavaMail.jira@brutus> Date: Thu, 14 Sep 2006 16:02:23 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-532) Writable underrun in sort example In-Reply-To: <32319787.1158251962387.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-532?page=all ] Owen O'Malley updated HADOOP-532: --------------------------------- Attachment: seqfile-underread-check.patch The compression codec is not reading the entire value buffer, but it is getting the correct value. (I suspect the unread bytes are a crc.) This error message is the SequenceFile complaining that the entire buffer was not used. This patch: 1. extends the unit test to use bigger values so that we detect the problem 2. allows the user of the org.apache.hadoop.io.TestSequenceFile main program to control the random seed (and prints out the seed value, even if it is random). 3. check that the stream is done by trying to read the next byte on the input stream. 4. removes some redundant buffering of the already buffered value stream. 5. marks the start of the value in non-block compressed sequence files and does a reset at the front of getCurrentValue. > Writable underrun in sort example > --------------------------------- > > Key: HADOOP-532 > URL: http://issues.apache.org/jira/browse/HADOOP-532 > Project: Hadoop > Issue Type: Bug > Components: io > Affects Versions: 0.6.1 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > Fix For: 0.6.2 > > Attachments: seqfile-underread-check.patch > > > When running the sort benchmark, I get consistent failures of this sort: > java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.io.BytesWritable@43d748ad read 2048 bytes, should read 2052 at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:150) at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:271) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1066) Caused by: java.io.IOException: org.apache.hadoop.io.BytesWritable@43d748ad read 2048 bytes, should read 2052 at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1163) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1239) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:181) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:147) ... 3 more -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira