Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85613D293 for ; Thu, 20 Sep 2012 14:29:41 +0000 (UTC) Received: (qmail 27208 invoked by uid 500); 20 Sep 2012 14:29:36 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 27129 invoked by uid 500); 20 Sep 2012 14:29:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 27122 invoked by uid 99); 20 Sep 2012 14:29:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 14:29:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lin.yang.jason@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Sep 2012 14:29:29 +0000 Received: by vbme21 with SMTP id e21so3196700vbm.35 for ; Thu, 20 Sep 2012 07:29:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=ZkTPTDOakDFolvYc37sPcGor6PDLZnVXke5GY3WFBpM=; b=Oyt7MNK4EqVpRHznL6BPbzz4KoZjyWwQmzQ7ryo6dpmlAm68AztA6C2ixAgpipwJSe h/NI8On119RTFjRLz0vqb39mWjeHTjpr/AdCphrpO+Dal7IDLJ7sRLabAiljqbLp/n9C 6L1lRS7Hntj1FUxEaeUkDr46PANkpGUzp5ZDITPEi/9nMXf7Jd3ED0l6HQbmnQGWoVm/ WaIZVUwKTcTkwc9UISPsgRz+cz58VFjA+akg9aJuRp/c2XtEGjkqQyN+H1ottJu+QK6o +a1j8OsR1JQUXs6hluoyxlBe3AecmUDWU5HadbteXXsHzK5sPki3MTsTANIz4b2SHx15 ivQw== Received: by 10.52.75.42 with SMTP id z10mr140002vdv.14.1348151348986; Thu, 20 Sep 2012 07:29:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.132.176 with HTTP; Thu, 20 Sep 2012 07:28:48 -0700 (PDT) From: Jason Yang Date: Thu, 20 Sep 2012 22:28:48 +0800 Message-ID: Subject: Job failed with large volume of small data: java.io.EOFException To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf307f352c42838704ca22f2cf X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f352c42838704ca22f2cf Content-Type: text/plain; charset=ISO-8859-1 Hi, all I have encounter a weird problem, I got a MR job which would always failed if there are large number of input file(e.g. 400 input files), but always succeed if there is only a little input files(e.g. 20 input files). In this job , the map phase would read all the input files and interpret each of them as a set of record, the intermediate output of mapper is , and reducer just write record with same type to same file by using a MultipleSequenceFileOutputFormat. according to the running status attached below, I have found that all the reducer has been failed, and the error is EOFException, which make me more confused. Is there any suggestion to fix this? ----- Hadoop job_201209191629_0013 on node10 *User:* root *Job Name:* localClustering.jar *Job File:* hdfs://node10:9000/mnt/md5/mapred/system/job_201209191629_0013/job.xml *Job Setup:* Successful *Status:* Failed *Started at:* Thu Sep 20 21:55:11 CST 2012 *Failed at:* Thu Sep 20 22:03:51 CST 2012 *Failed in:* 8mins, 40sec *Job Cleanup:* Successful ------------------------------ Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed Task Attempts map 100.00%400 00400 00 / 8 reduce 100.00% 70007 19 / 7 CounterMapReduce TotalJob CountersLaunched reduce tasks0026Rack-local map tasks0 01Launched map tasks00408Data-local map tasks0 0 407Failed reduce tasks001FileSystemCountersHDFS_BYTES_READ 899,202,3420899,202,342 FILE_BYTES_WRITTEN742,195,9520742,195,952HDFS_BYTES_WRITTEN 1,038,9600 1,038,960Map-Reduce FrameworkCombine output records00 0Map input records4000 400Spilled Records992,1240 992,124Map output bytes738,140,2560738,140,256Map input bytes567,520,400 0567,520,400Map output records992,1240992,124Combine input records0 00 task_201209191629_0013_r_000000 node6 FAILED java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) *syslog logs* ReduceTask: Read 267038 bytes from map-output for attempt_201209191629_0013_m_000392_0 2012-09-21 05:57:25,249 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201209191629_0013_m_000392_0 -> (15, 729) from node6 2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2012-09-21 05:57:25,436 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1 files left. 2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 72 files left. 2012-09-21 05:57:25,446 INFO org.apache.hadoop.mapred.Merger: Merging 72 sorted segments 2012-09-21 05:57:25,447 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 42 segments left of total size: 22125176 bytes 2012-09-21 05:57:25,755 INFO org.apache.hadoop.mapred.ReduceTask: Merged 72 segments, 22125236 bytes to disk to satisfy reduce memory limit 2012-09-21 05:57:25,757 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2 files, 108299192 bytes from disk 2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments 2012-09-21 05:57:25,764 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 108299184 bytes 2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2683295125469062550_13791 2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2048430611271251978_13803 2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4739785392963375165_13815 2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6981138506714889098_13819 2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-6981138506714889098_13819 bad datanode[0] nodes == null 2012-09-21 05:57:53,754 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/work/icc/intermediateoutput/lc/RR/_temporary/_attempt_201209191629_0013_r_000000_0/RR-LC-022033-1" - Aborting... 2012-09-21 05:57:54,539 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) 2012-09-21 05:57:54,542 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task -- YANG, Lin --20cf307f352c42838704ca22f2cf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, all=A0

I have encounter a=A0weird problem, I got a M= R job which would always failed if there are large number of input file(e.g= . 400 input files), but always=A0succeed if there is only a little input fi= les(e.g. 20 input files).=A0

In this job , the map phase would read all the input fi= les and interpret each of them as a set of record, the intermediate output = of mapper is <record.type, record>, and reducer just write record wit= h same type to same file by using a MultipleSequenceFileOutputFormat.

according to the running status attached below, I have = found that all the reducer has been failed, and the error is EOFException, = which make me more confused.

Is there = any suggestion to fix this?

-----

Ha= doop job_201209191629_0013 on=A0node10

User:=A0root
Job Name:=A0localClustering.jar
Job File:=A0hd= fs://node10:9000/mnt/md5/mapred/system/job_201209191629_0013/job.xml Job Setup:=A0Success= ful
Status:=A0Failed
Started at:=A0Thu Sep 20 21:55:11 CST 2012
Failed at:=A0Thu Sep 20 22:03:51 CST = 2012
Failed in:=A08mins, 40sec
Job Cleanup:=A0Succ= essful

=
Kind %= CompleteNum Tasks PendingRunningComplete KilledFailed/Kil= led
Task Attempts
map 100.00%=
400 00400 00 /=A08
reduce100.00%
70007 19=A0/=A07

<= table border=3D"2" cellpadding=3D"5" cellspacing=3D"2">
<= th style=3D"border-bottom-style:none;padding-bottom:4px;padding-top:4px"> CounterMapReduce TotalJob CountersLaunched reduce t= asks0026Rack-local map tasks0 01Launched map tasks00408Data-local map tasks0 0 407Failed reduce tasks001<= tr>FileSystemCountersHDFS_BYTES_READ 899,202,3420899,202,342FILE_BYTES_WRITTEN742,195,952= 0742,195,952= HDFS_BYTES_WRITTEN 1,038,96001,038,960Map-Reduce Framework= Combine output records00<= /td> 0Map input records4000400Spilled Record= s992,1240 992,124Map output bytes738,140,2= 560738,140,256<= tr>Map input bytes567,520,400 0567,520,400Map output records992,1240992,124Combine input records0 00


task_201209191629_0013_r_000000node6 FAILED
java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStrea=
m(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(=
DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.=
java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl=
ient.java:2288)

<= div>
syslog logs
ReduceTask: Read 267038 bytes from map-output for attempt_201209191629=
_0013_m_000392_0
2012-09-21 05:57:25,249 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 fr=
om attempt_201209191629_0013_m_000392_0 -> (15, 729) from node6
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEve=
ntsThread exiting
2012-09-21 05:57:25,435 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEv=
entsThread joined.
2012-09-21 05:57:25,436 INFO org.apache.hadoop.mapred.ReduceTask: Closed ra=
m manager
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: Interleav=
ed on-disk merge complete: 1 files left.
2012-09-21 05:57:25,437 INFO org.apache.hadoop.mapred.ReduceTask: In-memory=
 merge complete: 72 files left.
2012-09-21 05:57:25,446 INFO org.apache.hadoop.mapred.Merger: Merging 72 so=
rted segments
2012-09-21 05:57:25,447 INFO org.apache.hadoop.mapred.Merger: Down to the l=
ast merge-pass, with 42 segments left of total size: 22125176 bytes
2012-09-21 05:57:25,755 INFO org.apache.hadoop.mapred.ReduceTask: Merged 72=
 segments, 22125236 bytes to disk to satisfy reduce memory limit
2012-09-21 05:57:25,757 INFO org.apache.hadoop.mapred.ReduceTask: Merging 2=
 files, 108299192 bytes from disk
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0=
 segments, 0 bytes from memory into reduce
2012-09-21 05:57:25,758 INFO org.apache.hadoop.mapred.Merger: Merging 2 sor=
ted segments
2012-09-21 05:57:25,764 INFO org.apache.hadoop.mapred.Merger: Down to the l=
ast merge-pass, with 2 segments left of total size: 108299184 bytes
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Exception in=
 createBlockOutputStream java.io.EOFException
2012-09-21 05:57:29,727 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_-2683295125469062550_13791
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Exception in=
 createBlockOutputStream java.io.EOFException
2012-09-21 05:57:35,734 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_2048430611271251978_13803
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Exception in=
 createBlockOutputStream java.io.EOFException
2012-09-21 05:57:41,742 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_4739785392963375165_13815
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Exception in=
 createBlockOutputStream java.io.EOFException
2012-09-21 05:57:47,749 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_-6981138506714889098_13819
2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer=
 Exception: java.io.IOException: Unable to create new block.
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(=
DFSClient.java:2845)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.=
java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl=
ient.java:2288)

2012-09-21 05:57:53,753 WARN org.apache.hadoop.hdfs.DFSClient: Error Recove=
ry for block blk_-6981138506714889098_13819 bad datanode[0] nodes =3D=3D nu=
ll
2012-09-21 05:57:53,754 WARN org.apache.hadoop.hdfs.DFSClient: Could not ge=
t block locations. Source file "/work/icc/intermediateoutput/lc/RR/_te=
mporary/_attempt_201209191629_0013_r_000000_0/RR-LC-022033-1" - Aborti=
ng...
2012-09-21 05:57:54,539 WARN org.apache.hadoop.mapred.TaskTracker: Error ru=
nning child
java.io.EOFException
	at java.io.DataInputStream.readByte(DataInputStream.java:250)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStrea=
m(DFSClient.java:2901)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(=
DFSClient.java:2826)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.=
java:2102)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl=
ient.java:2288)
2012-09-21 05:57:54,542 INFO org.apache.hadoop.mapred.TaskRunner: Runnning =
cleanup for the task



--
YANG, Lin

--20cf307f352c42838704ca22f2cf--