hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Failures in the reducers
Date Tue, 12 Oct 2010 20:23:23 GMT
Is your cluster busy doing other things? (while this job is running)

On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari <rkothari_iit@hotmail.com>wrote:

>  Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
>
> Also, I don't get this error when I run my job on just 1 file (450 MB).
>
> I  wonder why this happen in the reduce stage since I just have 10 reducers
> and I don't see how those 256 connections are being opened.
>
> -Rakesh
>
> ------------------------------
> Date: Tue, 12 Oct 2010 13:02:16 -0700
> Subject: Re: Failures in the reducers
> From: shrijeet@rocketfuel.com
> To: mapreduce-user@hadoop.apache.org
>
>
> Rakesh,
> That error log looks like it belonged to DataNode and not NameNode. Anyways
> try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
> 512). This param belongs to core-site.xml .
>
> -Shrijeet
>
> On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <rkothari_iit@hotmail.com
> > wrote:
>
>  Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-892355450837523222_368657 received exception java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: HDFS_WRITE, cliID:
DFSClient_attempt_201010082153_0040_r_000000_2, srvID: DS-859924705-10.43.102.69-50010-1271546912162,
blockid: blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.43.102.69:50010,
storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent xcievers 256
>
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>
>

Mime
View raw message