hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Yang <lin.yang.ja...@gmail.com>
Subject Job failed with large volume of small data: java.io.EOFException
Date Thu, 20 Sep 2012 14:28:48 GMT
Hi, all

I have encounter a weird problem, I got a MR job which would always failed
if there are large number of input file(e.g. 400 input files), but
always succeed if there is only a little input files(e.g. 20 input files).

In this job , the map phase would read all the input files and interpret
each of them as a set of record, the intermediate output of mapper is
<record.type, record>, and reducer just write record with same type to same
file by using a MultipleSequenceFileOutputFormat.

according to the running status attached below, I have found that all the
reducer has been failed, and the error is EOFException, which make me more
confused.

Is there any suggestion to fix this?

-----
Hadoop job_201209191629_0013 on node10<http://localhost:50030/jobtracker.jsp>
*User:* root
*Job Name:* localClustering.jar
*Job File:*
hdfs://node10:9000/mnt/md5/mapred/system/job_201209191629_0013/job.xml<http://localhost:50030/jobconf.jsp?jobid=job_201209191629_0013>
*Job Setup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=setup&pagenum=1&state=completed>
*Status:* Failed
*Started at:* Thu Sep 20 21:55:11 CST 2012
*Failed at:* Thu Sep 20 22:03:51 CST 2012
*Failed in:* 8mins, 40sec
*Job Cleanup:* Successful<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=cleanup&pagenum=1&state=completed>
------------------------------
Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
Task Attempts<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013>
map<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1>
100.00%400 00400<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=map&pagenum=1&state=completed>
00 / 8<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=map&cause=killed>
reduce<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1>
100.00% 70007<http://localhost:50030/jobtasks.jsp?jobid=job_201209191629_0013&type=reduce&pagenum=1&state=killed>
19<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=failed>
 / 7<http://localhost:50030/jobfailures.jsp?jobid=job_201209191629_0013&kind=reduce&cause=killed>


CounterMapReduce TotalJob CountersLaunched reduce tasks0026Rack-local map
tasks0 01Launched map tasks00408Data-local map tasks0 0 407Failed reduce
tasks001FileSystemCountersHDFS_BYTES_READ 899,202,3420899,202,342
FILE_BYTES_WRITTEN742,195,9520742,195,952HDFS_BYTES_WRITTEN 1,038,9600
1,038,960Map-Reduce FrameworkCombine output records00 0Map input records4000
400Spilled Records992,1240 992,124Map output bytes738,140,2560738,140,256Map
input bytes567,520,400 0567,520,400Map output records992,1240992,124Combine
input records0 00

task_201209191629_0013_r_000000
<http://localhost:50030/taskdetails.jsp?jobid=job_201209191629_0013&tipid=task_201209191629_0013_r_000000>node6
<http://node6:50060/>
FAILED

java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)



*syslog logs*

ReduceTask: Read 267038 bytes from map-output for
attempt_201209191629_0013_m_000392_0
2012-09-21 05:57:25,249 INFO org.apache.hadoop.mapred.ReduceTask: Rec
#1 from attempt_201209191629_0013_m_000392_0 -> (15, 729) from node6
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask:
GetMapEventsThread exiting
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask:
getMapsEventsThread joined.
2012-09-21 05:57:25,436 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 1 files left.
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 72 files left.
2012-09-21 05:57:25,446 INFO org.apache.hadoop.mapred.Merger: Merging
72 sorted segments
2012-09-21 05:57:25,447 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 42 segments left of total size: 22125176
bytes
2012-09-21 05:57:25,755 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 72 segments, 22125236 bytes to disk to satisfy reduce memory
limit
2012-09-21 05:57:25,757 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 2 files, 108299192 bytes from disk
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 0 segments, 0 bytes from memory into reduce
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.Merger: Merging
2 sorted segments
2012-09-21 05:57:25,764 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 2 segments left of total size: 108299184
bytes
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-2683295125469062550_13791
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_2048430611271251978_13803
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_4739785392963375165_13815
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient:
Exception in createBlockOutputStream java.io.EOFException
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning block blk_-6981138506714889098_13819
2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception: java.io.IOException: Unable to create new
block.
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-6981138506714889098_13819 bad datanode[0]
nodes == null
2012-09-21 05:57:53,754 WARN org.apache.hadoop.hdfs.DFSClient: Could
not get block locations. Source file
"/work/icc/intermediateoutput/lc/RR/_temporary/_attempt_201209191629_0013_r_000000_0/RR-LC-022033-1"
- Aborting...
2012-09-21 05:57:54,539 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
2012-09-21 05:57:54,542 INFO org.apache.hadoop.mapred.TaskRunner:
Runnning cleanup for the task




-- 
YANG, Lin

Mime
View raw message