hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rakesh kothari <rkothari_...@hotmail.com>
Subject RE: Failures in the reducers
Date Tue, 12 Oct 2010 20:36:39 GMT

No. It just runs this job. It's 7 node cluster with 3 mapper and 2 reducer slot per node.

Date: Tue, 12 Oct 2010 13:23:23 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org

Is your cluster busy doing other things? (while this job is running) 

On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rkothari_iit@hotmail.com> wrote:






Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.

Also, I don't get this error when I run my job on just 1 file (450 MB).

I  wonder why this happen in the reduce stage since I just have 10 reducers and I don't see
how those 256 connections are being opened.


-Rakesh

Date: Tue, 12 Oct 2010 13:02:16 -0700
Subject: Re: Failures in the reducers
From: shrijeet@rocketfuel.com
To: mapreduce-user@hadoop.apache.org


Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping
the parameter named dfs.datanode.max.xcievers up (shoot for 512). This param belongs to core-site.xml
. 


-Shrijeet

On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rkothari_iit@hotmail.com> wrote:







Hi,

My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block
size is 512 MB. 

This job is failing consistently in the reduce phase with the following exception (below).
Any ideas how to troubleshoot this ?



Thanks,
-Rakesh

Datanode logs:



INFO
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments
left of total size: 408736960 bytes

2010-10-12
07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-961587459095414398_368580

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink
10.185.13.61:50010

2010-10-12
07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7795697604292519140_368580

2010-10-12
07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-7687883740524807660_368625

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-5546440551650461919_368626

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block
blk_-3894897742813130478_368628

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException

2010-10-12
07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_8687736970664350304_368652

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

 

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
blk_8687736970664350304_368652 bad datanode[0] nodes == null

2010-10-12
07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block
locations. Source file
"/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
- Aborting...

2010-10-12
07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

java.io.EOFException

       
at java.io.DataInputStream.readByte(DataInputStream.java:250)

       
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)

       
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

       
at org.apache.hadoop.io.Text.readString(Text.java:400)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)

       
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)

       
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)

2010-10-12
07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the
task



Namenode is throwing following exception:

2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010

2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-892355450837523222_368657
received exception java.io.EOFException2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162,
infoPort=8501, ipcPort=50020):DataXceiver

java.io.EOFException        at java.io.DataInputStream.readByte(DataInputStream.java:250)
       at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)

        at org.apache.hadoop.io.Text.readString(Text.java:400)        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
       at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)

        at java.lang.Thread.run(Thread.java:619)2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010

2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010

2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID:
DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162,
blockid: blk_9216465415312085861_368611

2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_9216465415312085861_368611 terminating2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_5680087852988027619_321244

2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-1637914415591966611_321290

…

2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256       
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)

2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_5731266331675183628_3212382010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.43.102.69:50010, storageID=DS-859924705-10.43.102.69-50010-1271546912162,
infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256       
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)

2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver

java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256       
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)        at java.lang.Thread.run(Thread.java:619)




 		 	   		  

 		 	   		  

 		 	   		  
Mime
View raw message