hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meng Mao <meng...@gmail.com>
Subject Re: EOFException and BadLink, but file descriptors number is ok?
Date Wed, 03 Feb 2010 21:04:39 GMT
also, which is the ulimit that's important, the one for the user who is
running the job, or the hadoop user that owns the Hadoop processes?

On Tue, Feb 2, 2010 at 7:29 PM, Meng Mao <mengmao@gmail.com> wrote:

> I've been trying to run a fairly small input file (300MB) on Cloudera
> Hadoop 0.20.1. The job I'm using probably writes to on the order of over
> 1000 part-files at once, across the whole grid. The grid has 33 nodes in it.
> I get the following exception in the run logs:
>
> 10/01/30 17:24:25 INFO mapred.JobClient:  map 100% reduce 12%
> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> attempt_201001261532_1137_r_000013_0, Status : FAILED
> java.io.EOFException
>     at java.io.DataInputStream.readByte(DataInputStream.java:250)
>     at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>     at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>     at org.apache.hadoop.io.Text.readString(Text.java:400)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2869)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>
> ....lots of EOFExceptions....
>
> 10/01/30 17:24:25 INFO mapred.JobClient: Task Id :
> attempt_201001261532_1137_r_000019_0, Status : FAILED
> java.io.IOException: Bad connect ack with firstBadLink 10.2.19.1:50010
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2871)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2794)
>      at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2077)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2263)
>
> 10/01/30 17:24:36 INFO mapred.JobClient:  map 100% reduce 11%
> 10/01/30 17:24:42 INFO mapred.JobClient:  map 100% reduce 12%
> 10/01/30 17:24:49 INFO mapred.JobClient:  map 100% reduce 13%
> 10/01/30 17:24:55 INFO mapred.JobClient:  map 100% reduce 14%
> 10/01/30 17:25:00 INFO mapred.JobClient:  map 100% reduce 15%
>
> From searching around, it seems like the most common cause of BadLink and
> EOFExceptions is when the nodes don't have enough file descriptors set. But
> across all the grid machines, the file-max has been set to 1573039.
> Furthermore, we set ulimit -n to 65536 using hadoop-env.sh.
>
> Where else should I be looking for what's causing this?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message