hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Yang <lin.yang.ja...@gmail.com>
Subject Not able to place enough replicas in Reduce
Date Mon, 24 Sep 2012 13:25:46 GMT
Hi, all

I have been stuck by a weird problem for a long time. so I was wondering
could anyone give me some advise?

I have a MapReduce Job , in which:

1. the mapper would read a whole file as a split (by overriding
FileInputFormat.isSplitable ) and translate it into a set of points, and
then cluster these points into 4 groups randomly. The output of mapper is
<splitID_GroupID, point>. so If I have 20 input files, then there will be
20*4=80 unique keys in total.

2. I have implemented a KeyasNameMSOutputFormat which return file name
according to the key:

> public class KeyasNameMSFOutputFormat
> extends MultipleSequenceFileOutputFormat<Text, point>
> {
> protected String generateFileNameForKeyValue(Text key,
>  Point value, String name)
> {
> return key.toString();
>  }
> }


 3. Use the IdentityReducer. and the jobConf is like:

> // set path for input and output
> ...
>
> // set format for input and output
> jobConf.setInputFormat(WholeFileInputFormat.class);
> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>
> // set class of output key and value
> jobConf.setOutputKeyClass(Text.class);
> jobConf.setOutputValueClass(Point.class);
>
> // set mapper and reducer
> jobConf.setMapperClass(LocalClusteringMapper.class);
> jobConf.setReducerClass(IdentityReducer.class);
>
> // run the job
> JobClient.runJob(jobConf);


On the pseudo-distributed hadoop cluster, this job runs fine with small
number of input files(e.g. 5 input files), but it would always failed in
the reduce phase if there are more than 20 input files.

I have checked the log, and find some strange logs:

*----JobTracker Log----*
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201209241945_0001_m_000019_0' has completed
task_201209241945_0001_m_000019 successfully.
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
completedMapsUpdates:20  completedMapsInputSize:28649108
 completedMapsOutputSize:41385220
2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201209241945_0001_r_000000_0:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


*----DataNode log----*
2012-09-24 19:49:08,156 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /127.0.0.1:50010
2012-09-24 19:49:08,223 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
2012-09-24 19:49:08,324 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
2012-09-24 19:49:08,389 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
2012-09-24 19:49:08,465 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /127.0.0.1:50010
2012-09-24 19:49:09,850 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_5143138923881025306_16019
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder 0 for block blk_5143138923881025306_16019 terminating
2012-09-24 19:49:09,941 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_1512576479762656406_16020
2012-09-24 19:49:09,944 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
block blk_1512576479762656406_16020 terminating
2012-09-24 19:49:09,975 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_8640569083674549899_16021

*----NameNode Log----*
2012-09-24 19:49:09,836 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare ip=/
127.0.0.1 cmd=create
src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
dst=null perm=yanglin:supergroup:rw-r--r--
2012-09-24 19:49:09,845
WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to
place
enough replicas, still in need of 1
2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
DFSClient_attempt_201209241945_0001_r_000000_0) from
127.0.0.1:37972:error: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
blk_5143138923881025306_16019 size 513321
2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
is closed by DFSClient_attempt_201209241945_0001_r_000000_0


*----taskTracker log----*
2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1873109 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000018_0
given 1873109/1873105
2012-09-24 19:49:02,452 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000018_0
2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1328490 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000019_0
given 1328490/1328486
2012-09-24 19:49:02,469 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000019_0
2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201209241945_0001_r_000000_0 done; removing files.
2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
task's state:UNASSIGNED
2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_201209241945_0001_r_000000_1

In case it concern, here is my developing environment:
Hadoop Version: 0.20.2
OS: Ubuntu 12.04 LTS

*yanglin@ubuntu:~$ ulimit -a*
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 32059
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10240
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 20480
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32059
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

--
YANG, Lin

Mime
View raw message