Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 81213 invoked from network); 24 Jul 2010 13:51:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jul 2010 13:51:57 -0000 Received: (qmail 90749 invoked by uid 500); 24 Jul 2010 13:51:57 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 90524 invoked by uid 500); 24 Jul 2010 13:51:55 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 90515 invoked by uid 99); 24 Jul 2010 13:51:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 13:51:54 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 13:51:49 +0000 Received: by qyk34 with SMTP id 34so1151158qyk.14 for ; Sat, 24 Jul 2010 06:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=meQ7lnhZYG/5LCxBYO94SXqdTmfyenYcgxMhAGsQl7M=; b=Uq18NyDpYyZl7mtKJm5HgiAUnwO62zow53uBAw4hy7eQiBhiLw03oEGK5CKrj+T7O2 Jpq00CoxttHAj9JhSZkbG9uFK67zMw34yGg3BJIwMvi+PNqClBdGWMxZRJiv6NBrB400 yl1/HhqMsFI45opBxaFSQVuU+XypXHQrHauqU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pD/rsoYPsEpQ80kEFNMBFnaAvod2l3B1k2EJziqMeVqtcBY4fK/mF3U9YS+Ldv+3xp QGIaC+EvRbslWLQKS9YznCCBW/wm0A0c6lb0/xGTviFn2AecBQB4wu4WuTkS+0+Em51o 9b+wKrCoKVTGHvdM0FVE2OxpLUASp0mxY2iLo= MIME-Version: 1.0 Received: by 10.224.93.130 with SMTP id v2mr3900030qam.253.1279979487830; Sat, 24 Jul 2010 06:51:27 -0700 (PDT) Received: by 10.229.88.194 with HTTP; Sat, 24 Jul 2010 06:51:27 -0700 (PDT) In-Reply-To: <201007131057.o6DAv96D002802@post.webmailer.de> References: <201007131057.o6DAv96D002802@post.webmailer.de> Date: Sat, 24 Jul 2010 06:51:27 -0700 Message-ID: Subject: Re: block errors From: Ted Yu To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000feaf14861b1462d048c227237 --000feaf14861b1462d048c227237 Content-Type: text/plain; charset=ISO-8859-1 Check the datanode log on 10.15.46.73 You should increase dfs.datanode.max.xcievers On Tue, Jul 13, 2010 at 3:57 AM, Some Body wrote: > Hi All, > > I had a MR job that processed 2000 small (<3MB ea.) files and it took 40 > minutes on 8 nodes. > Since the files are small it triggerred 2000 tasks. I packed my 2000 files > into a single 445MB > sequence file (K,V == Text,Text == ,). The new MR > job triggers 7 map > tasks (approx 64MB each) but it takes even longer (49 minutes) so I'm > trying to figure out why. > > I noticed these errors and I'm hoping someone can shed some light on why? > > Before I ran the job I ran hadoop fsck / and everything was healthy. > e.g.: no under-replicated, no corrupt blocks etc. > > ...... > 2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: > GetMapEventsThread exiting > 2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: > getMapsEventsThread joined. > 2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: Closed > ram manager > 2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: > Interleaved on-disk merge complete: 7 files left. > 2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: In-memory > merge complete: 0 files left. > 2010-07-13 03:24:20,814 INFO org.apache.hadoop.mapred.ReduceTask: Merging 7 > files, 2401573706 bytes from disk > 2010-07-13 03:24:20,815 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 > segments, 0 bytes from memory into reduce > 2010-07-13 03:24:20,818 INFO org.apache.hadoop.mapred.Merger: Merging 7 > sorted segments > 2010-07-13 03:24:20,827 INFO org.apache.hadoop.mapred.Merger: Down to the > last merge-pass, with 7 segments left of total size: 2401573678 bytes > 2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.15.46.73:50010 > 2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_4304053493083580280_260714 > 2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.15.46.35:50010 > 2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_3680469905814989852_260716 > 2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.15.46.35:50010 > 2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-673505196560500372_260717 > 2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.15.46.73:50010 > 2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-7054031797345836167_260717 > ...... > > > --000feaf14861b1462d048c227237 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Check the datanode log on 10.15.46.73

You should increase dfs.datanode.max.xcievers=

On Tue, Jul 13, 2010 at 3:57 AM, Some Bo= dy <somebo= dy@squareplanet.de> wrote:
Hi All,

I had a MR job that processed 2000 small (<3MB ea.) files and it took 40= minutes on 8 nodes.
Since the files are small it triggerred 2000 tasks. =A0I packed my 2000 fil= es into a single 445MB
sequence file (K,V =3D=3D Text,Text =3D=3D <filename>,<file-conten= t>). =A0The new MR job triggers 7 map
tasks (approx 64MB each) but =A0it takes even longer (49 minutes) so I'= m trying to figure out why.

I noticed these errors and I'm hoping someone can shed some light on wh= y?

Before I ran =A0the job I ran hadoop fsck / and everything was healthy.
e.g.: no under-replicated, no corrupt blocks etc.

......
2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEve= ntsThread exiting
2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEv= entsThread joined.
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: Closed ra= m manager
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: Interleav= ed on-disk merge complete: 7 files left.
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: In-memory= merge complete: 0 files left.
2010-07-13 03:24:20,814 INFO org.apache.hadoop.mapred.ReduceTask: Merging 7= files, 2401573706 bytes from disk
2010-07-13 03:24:20,815 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0= segments, 0 bytes from memory into reduce
2010-07-13 03:24:20,818 INFO org.apache.hadoop.mapred.Merger: Merging 7 sor= ted segments
2010-07-13 03:24:20,827 INFO org.apache.hadoop.mapred.Merger: Down to the l= ast merge-pass, with 7 segments left of total size: 2401573678 bytes
2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.73:500= 10
2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_4304053493083580280_260714
2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.35:500= 10
2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_3680469905814989852_260716
2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.35:500= 10
2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_-673505196560500372_260717
2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.73:500= 10
2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_-7054031797345836167_260717
......



--000feaf14861b1462d048c227237--