hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Medha Atre <medha.a...@gmail.com>
Subject Re: Problem in copyFromLocal
Date Fri, 10 Sep 2010 03:22:53 GMT
I did do "jps" on datanodes to check if they were running after doing
"start-dfs.sh" and "start-mapred.sh". Their logfiles did not show any
crash or error messages.

Also I encountered this problem even while running hadoop on a
single-node "pseudo-cluster" configuration as given in the tutorial
that I have mentioned below.

I have tried with 3 different configurations:
- single node
- 2 nodes with one master, and 2 slaves (master included)
- 3 nodes with one exclusive master, and other 2 slaves

and in all 3 configurations I have faced this problem. I will recheck
the logs on datanodes, but if that is the root cause, how should I fix
datanode start problem?


On Thu, Sep 9, 2010 at 9:31 PM, Zhang Jianfeng <jzhang.ch@gmail.com> wrote:
> I guess that you datanode did not start correctly. You can check the number
> of datanode by the webui.
>
> On Fri, Sep 10, 2010 at 9:08 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
>
>> check the data node's log to see whether it starts correctly
>>
>>
>> On Thu, Sep 9, 2010 at 8:51 AM, Medha Atre <medha.atre@gmail.com> wrote:
>> > Sorry for the typo in the earlier message:
>> > --------------------------------------------------------
>> >
>> > Hi,
>> >
>> > I am a new Hadoop user. I followed the tutorial by Michael Noll on
>> >
>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29(as<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29%28as>
>> > well as for single node) with Hadoop-0.20 and Hadoop-0.21. I keep
>> > facing
>> > one problem intermittently:
>> >
>> > My NameNode, JobTracker, DataNode, and TaskTrackers get started without
>> any
>> > problem and "jps" shows them running to. I can format the DFS space
>> without
>> > any problems. But when I try to use -copyFromLocal command, it fails with
>> > the following exception:
>> >
>> > 2010-09-09 05:54:04,216 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > handler 2 on 54310, call addBlock(/user/hadoop/multinode/advsh12.txt,
>> > DFSClient_2010062748, null, null) from
>> > 9.59.225.190:53125: error: java.io.IOException: File
>> > /user/hadoop/multinode/advsh12.txt could only be replicated to 0 nodes,
>> > instead of 1
>> > java.io.IOException: File /user/hadoop/multinode/advsh12.txt could only
>> be
>> > replicated to 0 nodes, instead of 1
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
>> >       at
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
>> >       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >       at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >       at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >       at java.lang.reflect.Method.invoke(Method.java:597)
>> >       at
>> >
>> org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
>> >       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
>> >       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
>> >       at java.security.AccessController.doPrivileged(Native Method)
>> >       at javax.security.auth.Subject.doAs(Subject.java:396)
>> >       at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>> >       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
>> >
>> > Notable thing is: if I let go sufficiently long time between a failure of
>> > the command and its repeat execution, it executes successfully the next
>> > time.
>> >
>> > But if I try to execute the same command without spending much time in
>> > between, it fails with the same exception, (I do shutdown all
>> servers/java
>> > processes, delete the DFS space manually with "rm -rf", and reformat it
>> with
>> > "namenode -format" between repeat executions of the -copyFromLocal
>> command).
>> >
>> > I checked the mailing list archives for this problem. One thread
>> >
>> http://www.mail-archive.com/common-user@hadoop.apache.org/msg00851.htmlsuggested
>> > to check and increase allowed open file descriptors. So I checked
>> > that on my system.
>> >
>> > $ cat /proc/sys/fs/file-max
>> > 1977900
>> > $
>> >
>> > This is a pretty large number.
>> >
>> > I checked updated the shell's open file limit too through
>> > /etc/security/limits.conf . Now it looks like -
>> >
>> > $ ulimit -a
>> > <snip>
>> > file size               (blocks, -f) unlimited
>> > pending signals                 (-i) 172032
>> > max locked memory       (kbytes, -l) 32
>> > max memory size         (kbytes, -m) unlimited
>> > open files                      (-n) *65535*
>> > pipe size            (512 bytes, -p) 8
>> > POSIX message queues     (bytes, -q) 819200
>> > real-time priority              (-r) 0
>> > stack size              (kbytes, -s) 8192
>> > cpu time               (seconds, -t) unlimited
>> > max user processes              (-u) 172032
>> > virtual memory          (kbytes, -v) unlimited
>> > file locks                      (-x) unlimited
>> >
>> > So I was wondering what might be the root cause of the problem and how I
>> can
>> > fix it (either in Hadoop or in my system)?
>> >
>> > Could someone please help me?
>> >
>> > Thanks.

Mime
View raw message