Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 95429 invoked from network); 1 Mar 2007 03:57:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2007 03:57:00 -0000 Received: (qmail 19933 invoked by uid 500); 1 Mar 2007 03:57:08 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 19902 invoked by uid 500); 1 Mar 2007 03:57:08 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 19892 invoked by uid 99); 1 Mar 2007 03:57:08 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Feb 2007 19:57:08 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [203.99.254.143] (HELO rsmtp1.corp.hki.yahoo.com) (203.99.254.143) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Feb 2007 19:56:56 -0800 Received: from ddaslaptop (vpn-client8.bangalore.corp.yahoo.com [10.80.52.8]) (authenticated bits=0) by rsmtp1.corp.hki.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l213uJGb090467 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Wed, 28 Feb 2007 19:56:24 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:references:subject:date:message-id:mime-version: content-type:content-transfer-encoding:x-mailer:in-reply-to:x-mimeole:thread-index; b=Wfjsn3UhUtFgb4YTPTDtYJOTcmGs+PW2OGkUhyJHiKndVXiYs1dihOqtqApeUmJF From: "Devaraj Das" To: References: <93a31b230702271914o43de0bfcgee58d6d40eae83d2@mail.gmail.com> <03b701c75b03$d14c5a60$2301a8c0@ds.corp.yahoo.com> <93a31b230702281004j1010981y47beae463caa9f28@mail.gmail.com> <93a31b230702281413t692e49f8xf0feafdb57aaff44@mail.gmail.com> Subject: RE: some reducers stock in copying stage Date: Thu, 1 Mar 2007 09:26:17 +0530 Message-ID: <017601c75bb5$94c20560$2301a8c0@ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <93a31b230702281413t692e49f8xf0feafdb57aaff44@mail.gmail.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AcdbhcP4muxsB7B+QW+K1IQmvVJDPQAL1/Og X-Virus-Checked: Checked by ClamAV on apache.org Weird! This looks like some other problem which happened while merging the outputs at the Reduce task. The copying stage went through fine. This requires some more analysis. > -----Original Message----- > From: Mike Smith [mailto:mike.smith.dev@gmail.com] > Sent: Thursday, March 01, 2007 3:44 AM > To: hadoop-dev@lucene.apache.org > Subject: Re: some reducers stock in copying stage > > Devaraj, > > After applying patch 1043 the copying problem is solved. But, I am > getting new exceptions, but, the tasks will be finished after reassigning > to > another tasktracker. So, the job gets done eventually. But, I never had > this > exception before applying this patch (or could it be because of chaning > back-off time to 5 sec?): > > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > :74) > at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > ChecksumFileSystem.java:217) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > ChecksumFileSystem.java:163) > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > SequenceFile.java:427) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > SequenceFile.java:414) > at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > :1665) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > SequenceFile.java:2579) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > SequenceFile.java:2351) > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > :2226) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > SequenceFile.java:2442) > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java > :74) > at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer( > ChecksumFileSystem.java:217) > at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read( > ChecksumFileSystem.java:163) > at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read( > FSDataInputStream.java:41) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset( > SequenceFile.java:427) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700( > SequenceFile.java:414) > at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java > :1665) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue( > SequenceFile.java:2579) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next( > SequenceFile.java:2351) > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java > :2226) > at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge( > SequenceFile.java:2442) > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2164) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:270) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1444) > > > > On 2/28/07, Mike Smith wrote: > > > > Thanks Devaraj, patch 1042 seems to be already committed. Also, the > system > > never recovered even after 1 min, 300 sec, it stocked there for hours. I > > will try patch 1043 and also decrease the back-off time to see if those > help > > > >