hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Reynolds <t...@hortonworks.com>
Subject Re: Not able to place enough replicas in Reduce
Date Mon, 24 Sep 2012 14:47:56 GMT
Jason,

The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node.  I generally means
that your DataNode is either down or not functioning correctly.  What is
the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).

Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079

<http://hortonworks.com/download/>



On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <lin.yang.jason@gmail.com>wrote:

> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>>  Point value, String name)
>> {
>> return key.toString();
>>  }
>> }
>
>
>  3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:28649108
>  completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not
able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException:
File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 32059
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 10240
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 20480
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 32059
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> --
> YANG, Lin
>
>

Mime
View raw message