hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@apache.org>
Subject Re: Lot of ClosedChannelExceptions in namenode
Date Fri, 02 Oct 2009 05:37:16 GMT
This is more likely a symptom than cause. One possible reason is that
clients are dying or restarting.

Looks like "caught: ..." log messages include stack trace as well, could you
include couple of those?

number of handlers might not change the situation. Even if the number of
handlers is not enough for your load, it would only slowdown the cluster (as
long as clients can handle it).

How many clients do you have?

Raghu.

On Thu, Oct 1, 2009 at 8:28 PM, Murali Krishna. P <muralikpbhat@yahoo.com>wrote:

> Hi,
>  We have a 200 node hadoop cluster (0.20.0) and have tweaked namenode and
> datanode handler to 40 and 10. The xcievers also had to be changed to 8192.
> But during the mapred jobs, we are seeing lot of task attempt failures
> saying "connection reset by peer". Following exception are there in the
> namenode logs.
> The tcp connection failures on the namenode also seems to be high. What
> could be wrong? (I have >6G heap for namenode). Do we need to increase the
> handlers further ?
>
>   Another side effect of this issue is that the SequeceFileOutput of the
> job seems to be corrupted. The next job is not able read some of these
> sequencefiles created by the previous job(eventhough the first job
> eventually succeeds after lot of connection related failures).
>
> namenode exceptions:
>
> java.io.IOException: Connection reset by peer
> 2009-10-01 00:44:32,329 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 24 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:32,330 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:32,331 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 29 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:32,343 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:34,575 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 21 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:34,601 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: readAndProcess threw exception java.io.IOException: C
> onnection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
> 2009-10-01 00:44:34,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: readAndProcess threw exception java.io.IOException: C
> onnection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
> 2009-10-01 00:44:35,641 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 16 on 9000 caught: java.nio.channels.ClosedChannelException
> 2009-10-01 00:44:40,380 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: readAndProcess threw exception java.io.IOException: C
> onnection reset by peer. Count of bytes read: 0
>
>
> Exception reading sequence file by the next job:
>
> java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>        at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062)
>        at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
>        at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
>        at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>
>  Thanks,
> Murali Krishna
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message