hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: Does error "could only be replicated to 0 nodes, instead of 1 " mean no datanodes available?
Date Wed, 26 May 2010 16:23:36 GMT
Alex:

>From the data node / secondary NN exceptions, it appears that nothing
can talk to your name node. Take a look in the name node logs and look
for where data node registration happens. Is it possible the NN disk
is full? My guess is that there's something odd happening with the
state on the name node. What does hadoop fsck / look like?

On Wed, May 26, 2010 at 6:53 AM, Alex Luya <alexander.luya@gmail.com> wrote:
> Hello:
>   I got this error when putting files into hdfs,it seems a old issue,and I
> followed the solution of this link:
> ----------------------------------------------------------------------------------------------------------------------------
> http://adityadesai.wordpress.com/2009/02/26/another-problem-with-hadoop-
> jobjar-could-only-be-replicated-to-0-nodes-instead-of-1io-exception/
> -----------------------------------------------------------------------------------------------------------------------------
>
> but problem still exists.so I tried to figure it out through source code:
> -----------------------------------------------------------------------------------------------------------------------------------
>  org.apache.hadoop.hdfs.server.namenode.FSNameSystem.getAdditionalBlock()
> -----------------------------------------------------------------------------------------------------------------------------------
>  // choose targets for the new block tobe allocated.
>    DatanodeDescriptor targets[] = replicator.chooseTarget(replication,
>                                                          
clientNode,
>                                                          
null,
>                                                          
blockSize);
>    if (targets.length < this.minReplication) {
>      throw new IOException("File " + src + " could only be replicated to " +
>                            targets.length + " nodes, instead of " +
>                            minReplication);
> --------------------------------------------------------------------------------------------------------------------------------------
>
> I think "DatanodeDescriptor" represents datanode,so here "targets.length"
> means the number of datanode,clearly,it is 0,in other words,no datanode is
> available.But in the web interface:localhost:50070,I can see 4 live nodes(I
> have 4 nodes only),and "hadoop dfsadmin -report" shows 4 nodes also.that is
> strange.
>        And I got this error message in secondary namenode:
> ---------------------------------------------------------------------------------------------------------------------------------
> 2010-05-26 16:26:39,588 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Recovering storage directory /home/alex/tmp/dfs/namesecondary from failed
> checkpoint.
> 2010-05-26 16:26:39,593 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
> doCheckpoint:
> 2010-05-26 16:26:39,594 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
> java.net.ConnectException: Connection refused
>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> ..................................
> ---------------------------------------------------------------------------------------------------------------------------------
> and error message in datanode:
> ---------------------------------------------------------------------------------------------------------------------------------
> 2010-05-26 16:07:49,039 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.1.3:50010,
> storageID=DS-1180479012-192.168.1.3-50010-1274799233678, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.IOException: Connection reset by peer
>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> .........................
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Seems like that network ports don't open,but after scaning by nmap,I can
> confirm that all network ports in relevant nodes are being opened.After two
> days effort,result is zero.
>
> Can anybody help me troubleshooting?Thank you.
>
>
>
>      (following is  relevant info:my cluster configuration,content conf files
> and oupt or "hadoop dfsadmin -report" and java error message stack )
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> my configuration is:
> -----------------------------------------------------------------------------------------
> ubuntu 10.04 64 bit+jdk1.6.0_20+hadoop  0.20.2,
> -----------------------------------------------------------------------------------------
>
>
>
> core-site.xml
> -----------------------------------------------------------------------------------------
> <configuration>
> <property>
>    <name>fs.default.name</name>
>    <value>hdfs://AlexLuya</value>
> </property>
> <property>
>    <name>hadoop.tmp.dir</name>
>    <value>/home/alex/tmp</value>
>
> </property>
> </configuration>
>
> -----------------------------------------------------------------------------------------
>
>
> hdfs-site.xml
> -----------------------------------------------------------------------------------------
> <configuration>
>        <property>
>                <name>dfs.replication</name>
>                <value>3</value>
>        </property>
>        <property>
>                <name>dfs.name.dir</name>
>                <value>/home/alex/hadoop/namenode</value>
>        </property>
>        <property>
>                <name>dfs.data.dir</name>
>                <value>/home/alex/hadoop/dfs</value>
>        </property>
>        <property>
>                <name>dfs.block.size</name>
>                <value>134217728</value>
>        </property>
>        <property>
>                <name>dfs.datanode.max.xcievers</name>
>                <value>2047</value>
>        </property>
> </configuration>
>
> -----------------------------------------------------------------------------------------
> masters
> -----------------------------------------------------------------------------------------
> 192.168.1.2
> -----------------------------------------------------------------------------------------
> slaves
> -----------------------------------------------------------------------------------------
> 192.168.1.3
> 192.168.1.4
> 192.168.1.5
> 192.168.1.6
>
> -----------------------------------------------------------------------------------------
> result of hadoop dfsadmin -report
> -----------------------------------------------------------------------------------------
> Configured Capacity: 6836518912 (6.37 GB)
> Present Capacity: 1406951424 (1.31 GB)
> DFS Remaining: 1406853120 (1.31 GB)
> DFS Used: 98304 (96 KB)
> DFS Used%: 0.01%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> -------------------------------------------------
> Datanodes available: 4 (4 total, 0 dead)
>
> Name: 192.168.1.5:50010
> Decommission Status : Normal
> Configured Capacity: 1709129728 (1.59 GB)
> DFS Used: 24576 (24 KB)
> Non DFS Used: 1345765376 (1.25 GB)
> DFS Remaining: 363339776(346.51 MB)
> DFS Used%: 0%
> DFS Remaining%: 21.26%
> Last contact: Tue May 25 20:51:09 CST 2010
>
>
> Name: 192.168.1.3:50010
> Decommission Status : Normal
> Configured Capacity: 1709129728 (1.59 GB)
> DFS Used: 24576 (24 KB)
> Non DFS Used: 1373503488 (1.28 GB)
> DFS Remaining: 335601664(320.05 MB)
> DFS Used%: 0%
> DFS Remaining%: 19.64%
> Last contact: Tue May 25 20:51:10 CST 2010
>
>
> Name: 192.168.1.6:50010
> Decommission Status : Normal
> Configured Capacity: 1709129728 (1.59 GB)
> DFS Used: 24576 (24 KB)
> Non DFS Used: 1346879488 (1.25 GB)
> DFS Remaining: 362225664(345.45 MB)
> DFS Used%: 0%
> DFS Remaining%: 21.19%
> Last contact: Tue May 25 20:51:08 CST 2010
>
>
> Name: 192.168.1.4:50010
> Decommission Status : Normal
> Configured Capacity: 1709129728 (1.59 GB)
> DFS Used: 24576 (24 KB)
> Non DFS Used: 1363419136 (1.27 GB)
> DFS Remaining: 345686016(329.67 MB)
> DFS Used%: 0%
> DFS Remaining%: 20.23%
> Last contact: Tue May 25 20:51:08 CST 2010
>
> -----------------------------------------------------------------------------------------
> Java error stack:
> -----------------------------------------------------------------------------------------
> 10/05/25 20:43:24 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/alex/input could only be replicated to 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:740)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy0.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
> 10/05/25 20:43:24 WARN hdfs.DFSClient: Error Recovery for block null bad
> datanode[0] nodes == null
> 10/05/25 20:43:24 WARN hdfs.DFSClient: Could not get block locations. Source
> file "/user/alex/input" - Aborting...
> put: java.io.IOException: File /user/alex/input could only be replicated to 0
> nodes, instead of 1
> 10/05/25 20:43:24 ERROR hdfs.DFSClient: Exception closing file /user/alex/input
> : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/alex/input could only be replicated to 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/alex/input could only be replicated to 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:740)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy0.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
> -----------------------------------------------------------------------------------------
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Mime
View raw message