hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: MapReduce job failing when a node of cluster is rebooted
Date Tue, 27 Dec 2011 13:55:59 GMT
Did the DN you've just rebooted connecting to the NN? Mostly the
datanode daemon is'nt running, check it:
ps waux |grep "DataNode" |grep -v "grep"

- ALex

On Tue, Dec 27, 2011 at 2:44 PM, Rajat Goel <rajatgoel06@gmail.com> wrote:
> Yes. Hdfs and Mapred related dirs are set outside of /tmp.
>
> On Tue, Dec 27, 2011 at 6:48 PM, alo alt <wget.null@googlemail.com> wrote:
>
>> Hi,
>>
>> did you set the hdfs-related dirs outside of /tmp? Most *ux systems
>> clean them up on reboot.
>>
>> - Alex
>>
>> On Tue, Dec 27, 2011 at 2:09 PM, Rajat Goel <rajatgoel06@gmail.com> wrote:
>> > Hi,
>> >
>> > I have a 7-node setup (1 - Namenode/JobTracker, 6 -
>> Datanodes/TaskTrackers)
>> > running Hadoop version 0.20.203.
>> >
>> > I performed the following test:
>> > Initially cluster is running smoothly. Just before launching a MapReduce
>> > job (about one or two minutes before), I shutdown one of the data nodes
>> > (rebooted the machine). Then my MapReduce job starts but immediately
>> fails
>> > with following messages on stderr:
>> >
>> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
>> > use org.apache.hadoop.log.metrics.EventCounter in all the
>> log4j.properties
>> > files.
>> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
>> > use org.apache.hadoop.log.metrics.EventCounter in all the
>> log4j.properties
>> > files.
>> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
>> > use org.apache.hadoop.log.metrics.EventCounter in all the
>> log4j.properties
>> > files.
>> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
>> > use org.apache.hadoop.log.metrics.EventCounter in all the
>> log4j.properties
>> > files.
>> > NOTICE: Configuration: /device.map    /region.map    /url.map
>> > /data/output/2011/12/26/08
>> >  PS:192.168.100.206:11111    3600    true    Notice
>> > 11/12/26 09:10:26 WARN mapred.JobClient: Use GenericOptionsParser for
>> > parsing the arguments. Applications should implement Tool for the same.
>> > 11/12/26 09:10:26 INFO input.FileInputFormat: Total input paths to
>> process
>> > : 24
>> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream
>> > java.io.IOException: Bad connect ack with firstBadLink as
>> > 192.168.100.5:50010
>> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Abandoning block
>> > blk_-6309642664478517067_35619
>> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Waiting to find target node:
>> > 192.168.100.7:50010
>> > 11/12/26 09:10:44 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream
>> > java.net.NoRouteToHostException: No route to host
>> > 11/12/26 09:10:44 INFO hdfs.DFSClient: Abandoning block
>> > blk_4129088682008611797_35619
>> > 11/12/26 09:10:53 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream
>> > java.io.IOException: Bad connect ack with firstBadLink as
>> > 192.168.100.5:50010
>> > 11/12/26 09:10:53 INFO hdfs.DFSClient: Abandoning block
>> > blk_3596375242483863157_35619
>> > 11/12/26 09:11:01 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream
>> > java.io.IOException: Bad connect ack with firstBadLink as
>> > 192.168.100.5:50010
>> > 11/12/26 09:11:01 INFO hdfs.DFSClient: Abandoning block
>> > blk_724369205729364853_35619
>> > 11/12/26 09:11:07 WARN hdfs.DFSClient: DataStreamer Exception:
>> > java.io.IOException: Unable to create new block.
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
>> >
>> > 11/12/26 09:11:07 WARN hdfs.DFSClient: Error Recovery for block
>> > blk_724369205729364853_35619 bad datanode[1] nodes == null
>> > 11/12/26 09:11:07 WARN hdfs.DFSClient: Could not get block locations.
>> > Source file
>> >
>> "/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split"
>> > - Aborting...
>> > 11/12/26 09:11:07 INFO mapred.JobClient: Cleaning up the staging area
>> >
>> hdfs://machine-100-205:9000/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292
>> > Exception in thread "main" java.io.IOException: Bad connect ack with
>> > firstBadLink as 192.168.100.5:50010
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
>> > 11/12/26 09:11:07 ERROR hdfs.DFSClient: Exception closing file
>> >
>> /data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split
>> > : java.io.IOException: Bad connect ack with firstBadLink as
>> > 192.168.100.5:50010
>> > java.io.IOException: Bad connect ack with firstBadLink as
>> > 192.168.100.5:50010
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>> >    at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
>> >
>> >
>> > - In the above logs, 192.168.100.5 is the machine I rebooted.
>> > - JobTracker's log file doesn't have any logs in the above time period.
>> > - NameNode's log file doesn't have any exceptions or any messages related
>> > to the above error logs.
>> > - All nodes can access each other via IP or hostnames.
>> > - ulimit values for files is set to 1024 but I don't see many connections
>> > in CLOSE_WAIT state (Googled a bit and some ppl suggest that this value
>> > could be a culprit in some cases)
>> > - My Hadoop configuration files have settings for no. of mappers (8),
>> > reducers (4), io.sort.mb (512 mb). Most of the other parameters have been
>> > configured to their default values.
>> >
>> > Can someone please provide any pointers to solution of this problem?
>> >
>> > Thanks,
>> > Rajat
>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> P Think of the environment: please don't print this email unless you
>> really need to.
>>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Mime
View raw message