hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Yang" <richardy...@richardyang.net>
Subject RE: System is hanging while executing bin/start-all.sh and bin/stop-all.sh
Date Fri, 23 Mar 2007 07:37:19 GMT
Hello Jaya,

It would be useful to check the namenode and datanode log file.  From my
past experiences, sometimes I would get java.io... kind of exceptions. I
think it could be relating to me using vmware. Though sometimes,
namenode/master simply could not connect to datanode/slaves (showing as some
ipc error message). I did the same thing to start the namenode/master and
the cluster is up and healthy again.  Too bad this phenomenon is not
consistent so I couldn't pin point how or why this happened.  Would you let
me know if you find the cause?  Thanks.

Best Regards
 
Richard Yang
richardyang@richardyang.net
kusanagiyang@gmail.com
 
 
-----Original Message-----
From: jaylac [mailto:Jayalakshmi.Muniasamy@cognizant.com] 
Sent: Thursday, March 22, 2007 8:43 PM
To: hadoop-user@lucene.apache.org
Subject: System is hanging while executing bin/start-all.sh and
bin/stop-all.sh


Hi

Whenever i execute the bin/start-all.sh command the slave node hangs.
Sometimes the master node hanga. If i restart the system and do the job,
then im getting the proper output... Same problem while stopping all the
daemons...

Have anyone faced this problem...

Please someone tell me the solution for this...

Im using two RED HAT LINUX machines... one master(10.229.62.6) and the other
slave(10.229.62.56)
In master node, the user name is jaya
In slave node, the user name is jaya

The steps which i follow is.....

Edit /home/jaya/.bashrc file
          Here ill set the HADOOP_CONF_DIR environment variable

MASTER NODE

1. Edit conf/slaves file....
        Contents
        ====================
         localhost
          jaya@10.229.62.56
         ====================

2. Edit conf/hadoop-en.sh file
      # The java implementation to use.  Required.
      export JAVA_HOME=/usr/java/jdk1.6.0

      # The maximum amount of heap to use, in MB. Default is 1000.
      export HADOOP_HEAPSIZE=200

      export HADOOP_HOME=/opt/hadoop-0.11.0

      # Extra Java runtime options.  Empty by default.
      export HADOOP_OPTS=-server
      
      # Where log files are stored.  $HADOOP_HOME/logs by default.
      export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

      # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by
default.
      export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

      Thats it.... No other changes in this file....

3. Edit conf/hadoop-site.xml file
       Contents
        ===========================================
         <?xml version="1.0"?>
         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

         <!-- Put site-specific property overrides in this file. -->

         <configuration>

         <property>
         <name>fs.default.name</name>
         <value>10.229.62.6:50010</value>
         </property>

         <property>
         <name>mapred.job.tracker</name>
         <value>10.229.62.6:50011</value>
         </property>
         
         <property>
         <name>dfs.replication</name>
         <value>3</value>
         </property>
         
         </configuration>
         ====================================

 SLAVE NODE

1. Edit conf/masters file....
        Contents
        ====================
          jaya@10.229.62.56
         ====================

2. Edit conf/hadoop-en.sh file
      # The java implementation to use.  Required.
      export JAVA_HOME=/usr/java/jdk1.6.0

      # The maximum amount of heap to use, in MB. Default is 1000.
      export HADOOP_HEAPSIZE=200

      export HADOOP_HOME=/opt/hadoop-0.11.0

      # Extra Java runtime options.  Empty by default.
      export HADOOP_OPTS=-server
      
      # Where log files are stored.  $HADOOP_HOME/logs by default.
      export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

      # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by
default.
      export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

      Thats it.... No other changes in this file....

3. Edit conf/hadoop-site.xml file
       Contents
        ===========================================
         <?xml version="1.0"?>
         <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

         <!-- Put site-specific property overrides in this file. -->

         <configuration>

         <property>
         <name>fs.default.name</name>
         <value>10.229.62.6:50010</value>
         </property>

         <property>
         <name>mapred.job.tracker</name>
         <value>10.229.62.6:50011</value>
         </property>
         
         <property>
         <name>dfs.replication</name>
         <value>3</value>
         </property>
         
         </configuration>
         ====================================


I've already done steps for passwordless login

Thats is all........... Then ill perform the following operations....

In the HADOOP_HOME directory,

jaya@localhost hadoop-0.11.0]$ bin/hadoop namenode -format
Re-format filesystem in /tmp/hadoop-146736/dfs/name ? (Y or N) Y
Formatted /tmp/hadoop-146736/dfs/name
[jaya@localhost hadoop-0.11.0]$

Then

[jaya@localhost hadoop-0.11.0]$ bin/start-all.sh
tarting namenode, logging to
/opt/hadoop-0.11.0/logs/hadoop-jaya-namenode-local
host.localdomain.out
10.229.62.109: starting datanode, logging to
/opt/hadoop-0.11.0/logs/hadoop-jaya
-datanode-auriga.out
localhost: starting datanode, logging to
/opt/hadoop-0.11.0/logs/hadoop-jaya-dat
anode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to
/opt/hadoop-0.11.0/logs/hadoop
-jaya-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to
/opt/hadoop-0.11.0/logs/hadoop-jaya-jobtracker-l
ocalhost.localdomain.out
10.229.62.109: starting tasktracker, logging to
/opt/hadoop-0.11.0/logs/hadoop-j
aya-tasktracker-auriga.out
localhost: starting tasktracker, logging to
/opt/hadoop-0.11.0/logs/hadoop-jaya-
tasktracker-localhost.localdomain.out

At this time the slave node gets hanged....

ill reastart the slave node....

Then im getting the proper output when i execute "bin/hadoop jar
hadoop-0.11.0-examples.jar input output"

Similarly, when i stop the daemons, the slave node is hanging... Sometimes
the master node is also hanging...

Please help me asap........

Thanks in advance 

Jaya
-- 
View this message in context:
http://www.nabble.com/System-is-hanging-while-executing-bin-start-all.sh-and
-bin-stop-all.sh-tf3451821.html#a9628679
Sent from the Hadoop Users mailing list archive at Nabble.com.




Mime
View raw message