hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@duboce.net>
Subject Re: Hbase scripts problem
Date Mon, 27 Aug 2007 17:23:19 GMT
Michele Catasta wrote:
> Hi,
> we are having problems with hbase scripts. Basically, when we run the
> stop script, it's not able to kill gracefully the HMaster (instead, it
> forks another HMaster that after a few time dies).

Hello Michele:

If you trace, you will find that the stop-hbase.sh script invokes 
HMaster.main which launches a client (HBaseAdmin) to invoke the shutdown 
method on the actual cluster HMaster.  I'm guessing HBaseAdmin is stuck 
unable to contact the remote HMaster perhaps because its trying to 
access the wrong address (Because HBaseAdmin is running inside 
HMaster.main, it looks like there are two HMasters's running when you do 
a process listing).

Check the logs to see if you can get a clue as to what is going on.  Did 
the cluster HMaster get the shutdown signal?   (Is it running the 
shutdown sequence?)  Logs are in $HADOOP_HOME/logs.  Look at the 
hbase-USERID-master-*log content.  Might help if you up the log level to 
DEBUG (add the line 'log4j.logger.org.apache.hadoop.hbase.HMaster=DEBUG' 
to $HADOOP_HOME/conf/log4j.properites).  Stack traces are also useful 
figuring where the programs are hung (Send a 'kill -QUIT PROCESS_ID'.  
The output will appear in the '*.out' logs).

Make an issue and attach the logs if its not obvious to you whats going 
on and we'll take a look.

> The start script, on the other hand, is not able to start up the
> HRegionserver, probably because the HMaster has been killed improperly
> before. Taking a look at the logs, I found that he was complaining
> because of an already existing directory inside HDFS (regionserver log
> directory IIRC). 
The outstanding log on improper shutdown should have been addressed by 

> After I deleted it, it is dying for another reason
> that I cannot understand:
> 2007-08-27 14:40:12,979 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 60010: starting
> 2007-08-27 14:40:12,979 INFO org.apache.hadoop.hbase.HRegionServer:
> HRegionServer started at:
> 2007-08-27 14:40:12,980 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 60010: starting
> 2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: closing leases
> 2007-08-27 14:40:12,984 INFO org.apache.hadoop.hbase.Leases: leases closed
> 2007-08-27 14:40:12,984 INFO org.apache.hadoop.ipc.Server: Stopping
> server on 60010
> 2007-08-27 14:40:12,985 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 60010: exiting
> This is a snippet of the log file. Handlers are initialized and soon
> after stopped. Maybe because I've used start-hbase.sh even if an
> HMaster instance was already up?
Should still work.  Logging should say why a region server has decided 
to shutdown.  Perhaps the reason is present if you set the level to 
DEBUG?  (I'm guessing its because it can't find the master -- you have 
set the hbase.master property in hbase-site.xml appropriate for your 
cluster?).  Add the following to your log4j.properties file:



> We would like to solve this situation in a graceful way, because the
> last time we needed to erase all our hbase content.
> Best Regards,
>     -Michele Catasta

View raw message