hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Hadoop Distributed System Problems: Does not recognise any slave nodes
Date Thu, 24 Mar 2011 08:48:39 GMT
Hello Andy,

The list forbids some attachments; could you paste your logs on any
available paste service and post back a link to that here?
http://paste.pocoo.org is a good one.

Your configuration looks alright for a homogeneous cluster setup. When
you say "does not recognize", do you mean that you have 1 live node
(master) and rest all are dead? Have you ensured that your TaskTracker
and DataNode services started successfully on all the slave machines
(as provided in the conf/slaves file)? Check logs of any service that
does not start successfully - that should help you track down issues

On Thu, Mar 24, 2011 at 1:13 PM, Andy XUE <andyxueyuan@gmail.com> wrote:
> Hi there:
> I'm a new user to Hadoop and Nutch, and I am trying to run the crawler Nutch
> on a distributed system powered by Hadoop. However as it turns out, the
> distributed system does not recognise any slave nodes in the cluster. I've
> stucked at this point for months and am desperate to look for a solution. I
> appreciate if anyone would be kindly enough to spend 10 minutes of their
> valuable time to help.
> Thank you so much!!
> This is what I currently encounter:
> ==================================
> In order to set up Hadoop clusters, I followed the instructions described in
> both of:
>         http://wiki.apache.org/nutch/NutchHadoopTutorial
>         http://hadoop.apache.org/common/docs/current/cluster_setup.html
> The problem is that, when we have a distributed file system (HDFS in Hadoop)
> , the files are stored on both of the computers. All data in HDFS, which are
> supposed to be replicated or stored onto every computer in the cluster, is
> only found on the master node. They are not replicated to other slave nodes
> in the cluster, which causes the subsequent tasks such as jobtracker to
> fail. I've attached a jobstracker log file.
> It worked fine when there is only one computer (the master node) in the
> cluster and everything is stored in the master node. However the problem
> arises when the program tries to write files onto another computer (slave
> node). The wield part is that HDFS can create folders on the slave nodes but
> not the files. Therefore the HDFS folders on the slave nodes are all empty.
> On the web interface (http://materNode:50070 and http://materNode:50030)
> which shows the status of HDFS and jobtracker, it indicates that there is
> only one active node (i.e., the master node). It fails to recognize any of
> the slave nodes.
> I use Nutch 1.2 and Hadoop 0.20 in the experiment.
> Here are the things that I've done:
> I followed the instructions in the aforementioned documentations. I created
> users with identical username on multiple computers, which belong to the
> same local network, with Ubuntu 10.10 installed. I set passphrase-less ssh
> keys for all computers and experiments show that every node in the cluster
> can ssh to another without the requirement of a password. I've shutdown the
> firewall by "sudo ufw disable". I've tried to search for solutions on the
> Internet, but there is no luck so far.
> Appreciate for the help.
> The Hadoop configuration files (core-site.xml, hdfs-site.xml,
> mapred-site.xml, and hadoop-env.sh) and the log file with error message
> (hadoop-rui-jobtracker-ss2.log) are attached.
> ==================================
> Regards
> Andy
> The University of Melbourne

Harsh J

View raw message