hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bates <christopher.andrew.ba...@gmail.com>
Subject Re: HBase 0.20.1 Distributed Install Problems
Date Wed, 11 Nov 2009 15:09:38 GMT
I'll go ahead and post the log of one of the machines that didn't have that
META information at the end:

Wed Nov 11 09:14:10 EST 2009 Starting regionserver on crunch2ulimit -n
10242009-11-11 09:14:11,822 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
-XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode,
-Dhbase.log.dir=/opt/hadoop/hbase-0.20.1/bin/../logs,
-Dhbase.log.file=hbase-hadoop-regionserver-crunch2.log,
-Dhbase.home.dir=/opt/hadoop/hbase-0.20.1/bin/.., -Dhbase.id.str=hadoop,
-Dhbase.root.logger=INFO,DRFA,
-Djava.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32]2009-11-11
09:14:11,919 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: My
address is crunch2.local:600202009-11-11 09:14:12,060 INFO
org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: Initializing RPC Metrics with
hostName=HRegionServer, port=600202009-11-11 09:14:12,222 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
globalMemStoreLimit=399.4m, globalMemStoreLimitLowMark=249.6m,
maxHeap=998.4m
2009-11-11 09:14:12,227 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.2.1-808558, built on 08/27/2009 18:48 GMT
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=crunch2.local
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_14
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.14/jre
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/opt/hadoop/hbase-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/opt/hadoop/hbase-0.20.1/bin/..:
/opt/hadoop/hbase-0.20.1/bin/../hbase-0.20.1.jar:/opt/hadoop/hbase-0.20.1/bin/../hbase-0.20.1-test.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/AgileJSON-2009-03-30.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/com
mons-cli-2.0-SNAPSHOT.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/commons-el-from-jetty-5.1.4.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/commons-
logging-1.0.4.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/commons-math-1.1.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/hadoop-0.20.1-hdfs127-core.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/hadoop-0.20.1-test.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/hb
ase-0.20.1/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/jruby-complete-1.2.0.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/json.jar:
/opt/hadoop/hbase-0.20.1/bin/../lib/junit-3.8.1.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/libthrift-r771587.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/luce
ne-core-2.2.0.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/zookeeper-3.2.1.jar:/opt/hadoop/hb
ase-0.20.1/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/hbase-0.20.1/bin/../lib/jsp-2.1/jsp-api-2.1.jar
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32
2009-11-11 09:14:12,305 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=i386
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.28-11-generic
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=hadoop
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
2009-11-11 09:14:12,306 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/opt/hadoop/hbase-0.20.1
2009-11-11 09:14:12,308 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=crunch2:2181,chanel2:2181,chanel:2181,chris:2181,crunch3:2181
sessionTimeout=60000 watcher=org.apa
che.hadoop.hbase.regionserver.HRegionServer@dc67e
2009-11-11 09:14:12,312 INFO org.apache.zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
2009-11-11 09:14:12,347 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server chanel2/172.16.1.46:2181
2009-11-11 09:14:12,349 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
172.16.1.95:39519 remote=chanel2/172.16.1.46:2181]
2009-11-11 09:14:12,366 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2009-11-11 09:14:12,397 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: SyncConnected, type: None, path: null
2009-11-11 09:14:12,679 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
172.16.1.46:60000 that we are up
2009-11-11 09:14:34,162 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address
to use. Was=172.16.1.95:60020, Now=172.16.1.95
2009-11-11 09:14:34,577 INFO org.apache.hadoop.hbase.regionserver.HLog: HLog
configuration: blocksize=67108864, rollsize=63753420, enabled=true,
flushlogentries=100, optionallogflushinternal=10000ms
2009-11-11 09:14:34,814 INFO org.apache.hadoop.hbase.regionserver.HLog: New
hlog /hbase/.logs/crunch2.local,60020,1257948873837/hlog.dat.1257948874578
2009-11-11 09:14:34,848 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=RegionServer,
sessionId=regionserver/172.16.1.95:60020
2009-11-11 09:14:34,852 INFO
org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
Initialized
2009-11-11 09:14:35,248 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 60030
2009-11-11 09:14:35,249 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 60030
webServer.getConnectors()[0].getLocalPort() returned 60030
2009-11-11 09:14:35,249 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 60030
2009-11-11 09:14:44,769 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder: starting
2009-11-11 09:14:44,772 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: starting
2009-11-11 09:14:44,792 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: starting
2009-11-11 09:14:44,793 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: starting
2009-11-11 09:14:44,801 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: starting
2009-11-11 09:14:44,802 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: starting
2009-11-11 09:14:44,804 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: starting
2009-11-11 09:14:44,804 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: starting
2009-11-11 09:14:44,805 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: starting
2009-11-11 09:14:44,805 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: starting
2009-11-11 09:14:44,806 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: starting
2009-11-11 09:14:44,806 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: starting
2009-11-11 09:14:44,806 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
at: 172.16.1.95:60020
2009-11-11 09:14:44,818 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
Allocating LruBlockCache with maximum size 199.7m


On Wed, Nov 11, 2009 at 9:12 AM, Chris Bates <
christopher.andrew.bates@gmail.com> wrote:

> @Lars -- We're configured like in the "Getting Started" guide -- tried to
> follow the instructions verbatim.
>
> Ok. Thanks for the tip. I'm not sure what causes the orphan behavior, but
> thats good to know.  I'll also know what to look for in the jstack.  After
> stopping the region servers individually, then doing a stop / restart, I get
> regionserver logs
>
> Here is M2 (chanel)
> Wed Nov 11 08:49:30 EST 2009 Starting regionserver on chanel
> ulimit -n 1024
> 2009-11-11 08:49:32,081 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
> -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.lo
> 2009-11-11 08:49:32,229 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: My address is
> chanel.local:60020
> 2009-11-11 08:49:32,418 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
> Initializing RPC Metrics with hostName=HRegionServer, port=60020
> 2009-11-11 08:49:32,629 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> globalMemStoreLimit=398.7m, globalMemStoreLimitLowMark=249.2m,
> maxHeap=996.8m
> 2009-11-11 08:49:32,641 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.2.1-808558, built on 08/27/2009 18:48 GMT
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=chanel.local
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_16
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.16/jre
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=/opt/hadoop/hbase-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/opt/hadoop/hbase-0.20.1/bin/..:
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=i386
> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.24-24-generic
> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=hadoop
> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=/home/hadoop
> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=/opt/hadoop/hbase-0.20.1
> 2009-11-11 08:49:32,760 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
> connectString=crunch2:2181,chanel2:2181,chanel:2181,chris:2181,crunch3:2181
> sessionTimeout=60000 watcher=org.apa
> 2009-11-11 08:49:32,781 INFO org.apache.zookeeper.ClientCnxn:
> zookeeper.disableAutoWatchReset is false
> 2009-11-11 08:49:32,800 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server crunch2/172.16.1.95:2181
> 2009-11-11 08:49:32,802 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected local=/
> 172.16.1.45:33873 remote=crunch2/172.16.1.95:2181]
> 2009-11-11 08:49:32,814 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2009-11-11 08:49:32,829 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: SyncConnected, type: None, path: null
> 2009-11-11 08:49:32,973 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
> 172.16.1.46:60000 that we are up
> 2009-11-11 08:49:33,509 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address
> to use. Was=172.16.1.45:60020, Now=172.16.1.45
> 2009-11-11 08:49:34,090 INFO org.apache.hadoop.hbase.regionserver.HLog:
> HLog configuration: blocksize=67108864, rollsize=63753420, enabled=true,
> flushlogentries=100, optionallogflushinternal=10000ms
> 2009-11-11 08:49:34,281 INFO org.apache.hadoop.hbase.regionserver.HLog: New
> hlog /hbase/.logs/chanel.local,60020,1257947373328/hlog.dat.1257947374090
> 2009-11-11 08:49:34,319 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=RegionServer,
> sessionId=regionserver/172.16.1.45:60020
>  2009-11-11 08:49:34,350 INFO
> org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
> Initialized
> 2009-11-11 08:49:34,723 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
> Opening the listener on 60030
> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 60030
> webServer.getConnectors()[0].getLocalPort() returned 60030
> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 60030
> 2009-11-11 08:49:35,673 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder: starting
> 2009-11-11 08:49:35,674 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> listener on 60020: starting
> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020: starting
> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60020: starting
> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60020: starting
> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60020: starting
> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60020: starting
> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 5 on 60020: starting
> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60020: starting
> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60020: starting
> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60020: starting
> 2009-11-11 08:49:35,676 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
> at: 172.16.1.45:60020
> 2009-11-11 08:49:35,677 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60020: starting
> 2009-11-11 08:49:35,694 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Allocating LruBlockCache
> with maximum size 199.4m
> 2009-11-11 08:49:38,798 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-11 08:49:38,811 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region .META.,,1/1028785192 available; sequence id is 0
>
> The only difference between that log file and the rest is that the others
> don't have
> 2009-11-11 08:49:38,798 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-11 08:49:38,811 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region .META.,,1/1028785192 available; sequence id is 0
>
> except M5 which has it and says id is 3....if you want me to paste all the
> logs, I can...didn't think it was necessary.
>
> hbase(main):001:0> zk_dump
>
> HBase tree in ZooKeeper is rooted at /hbase
>   Cluster up? true
>   In safe mode? false
>   Master address: 172.16.1.46:60000
>   Region server holding ROOT: 172.16.1.96:60020
>   Region servers:
>     - 172.16.1.95:60020
>     - 172.16.1.96:60020
>     - 172.16.1.83:60020
>     - 172.16.1.45:60020
> hbase(main):002:0> status 'simple'
> 4 live servers
>     crunch3.local:60020 1257928281006
>         requests=0, regions=1, usedHeap=31, maxHeap=998
>     chris.local:60020 1257947367197
>         requests=0, regions=0, usedHeap=29, maxHeap=996
>     crunch2.local:60020 1257947370011
>         requests=0, regions=0, usedHeap=29, maxHeap=998
>     chanel.local:60020 1257947373328
>         requests=0, regions=1, usedHeap=31, maxHeap=996
> 0 dead servers
>
>
> I started and stopped Hbase a couple times without deleting the HDFS copy,
> and everything seems to work fine....was that it??? What caused the
> regionservers to orphan?
>
> On Wed, Nov 11, 2009 at 5:17 AM, Lars George <lars@worldlingo.com> wrote:
>
>> Hi Chris,
>>
>> As Tatsuya says, but I would expect you have to "kill" them as they are in
>> a state where a daemon stop may not work as they will not be perceptible to
>> that event yet.
>>
>> I saw the same as Tatsuya and assumed you have an issue with your quorum
>> setting on the slaves. I have not followed the whole discussion in all the
>> details so let me ask you how you configured it?
>>
>> I have personally done a symlink of my standalone zoo.cfg from the
>> zookeeper/conf to the hbase/conf directory. You could also have set the
>> quorum servers in the hbase-site.xml as per the "Getting started" guide.
>>
>> If you have done one of those ways already then make sure all ZK servers
>> are up and can be reached from the region servers.
>>
>> Lars
>>
>> Tatsuya Kawano schrieb:
>>
>>  Hi Chris, and thanks Lars for help.
>>>
>>> OK. So "jstack 22200" shows your region server is trying to finish
>>> starting up, but stuck in a middle when try to get IP address of the
>>> master from ZooKeeper.
>>>
>>> ===========================================================
>>> "main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition
>>> [0xb72f2000]
>>>  java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> at java.lang.Thread.sleep(Native Method)
>>>  at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74)
>>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51)
>>>  at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387)
>>> at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
>>>  at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
>>> at
>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
>>> ===========================================================
>>>
>>> I still need to see the regionserver logs to figure out why this is
>>> happening.
>>>
>>>
>>> Also,
>>>
>>>
>>>
>>>> This is what I see when I run start-hbase.sh -- I can ssh into any of
>>>> the
>>>> boxes with no password just fine, it just gives me a weird first time
>>>> host
>>>> message...we get the same thing when we start up hadoop.
>>>>
>>>>
>>> ...
>>>
>>>
>>>> crunch2: regionserver running as process 6950. Stop it first.
>>>> chanel: regionserver running as process 22200. Stop it first.
>>>> crunch3: regionserver running as process 28962. Stop it first.
>>>> chris: regionserver running as process 28719. Stop it first.
>>>>
>>>>
>>>
>>> This "Stop it first" message means your region servers didn't stop
>>> when you ran stop-hbase.sh. The master couldn't locate those region
>>> servers so it couldn't tell them to shutdown. This is why you've got
>>> those orphan region servers. So until we finish setting your HBase
>>> cluster up, you'll have to stop those region servers by hand.
>>>
>>> To do this, ssh to M2 -- M5, and type the following command:
>>> ${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver
>>>
>>> Then jps again to make sure HRegionServer doesn't exist. If the above
>>> command doesn't work, you can use Unix "kill" command. Then ssh to M1,
>>> run stop-hbase.sh to stop the master and ZooKeepers.
>>>
>>>
>>> It's still a mystery you don't have regionserver logs while you have
>>> zookeeper logs. Maybe those orphan region servers was the reason?  I
>>> don't know, but you can give it another try after stopping them. So,
>>> try to stop whole HBase / ZooKeeper process by above way, then run
>>> start-hbase.sh once again. If you can get the regionserver log to us,
>>> that would be great.
>>>
>>> Thanks,
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message