hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bates <christopher.andrew.ba...@gmail.com>
Subject Re: HBase 0.20.1 Distributed Install Problems
Date Wed, 11 Nov 2009 14:12:05 GMT
@Lars -- We're configured like in the "Getting Started" guide -- tried to
follow the instructions verbatim.

Ok. Thanks for the tip. I'm not sure what causes the orphan behavior, but
thats good to know.  I'll also know what to look for in the jstack.  After
stopping the region servers individually, then doing a stop / restart, I get
regionserver logs

Here is M2 (chanel)
Wed Nov 11 08:49:30 EST 2009 Starting regionserver on chanel
ulimit -n 1024
2009-11-11 08:49:32,081 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
-XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.lo
2009-11-11 08:49:32,229 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: My address is
chanel.local:60020
2009-11-11 08:49:32,418 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
Initializing RPC Metrics with hostName=HRegionServer, port=60020
2009-11-11 08:49:32,629 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
globalMemStoreLimit=398.7m, globalMemStoreLimitLowMark=249.2m,
maxHeap=996.8m
2009-11-11 08:49:32,641 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.2.1-808558, built on 08/27/2009 18:48 GMT
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=chanel.local
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_16
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.16/jre
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/opt/hadoop/hbase-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/opt/hadoop/hbase-0.20.1/bin/..:
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=i386
2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.24-24-generic
2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=hadoop
2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/hadoop
2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/opt/hadoop/hbase-0.20.1
2009-11-11 08:49:32,760 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=crunch2:2181,chanel2:2181,chanel:2181,chris:2181,crunch3:2181
sessionTimeout=60000 watcher=org.apa
2009-11-11 08:49:32,781 INFO org.apache.zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
2009-11-11 08:49:32,800 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server crunch2/172.16.1.95:2181
2009-11-11 08:49:32,802 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
172.16.1.45:33873 remote=crunch2/172.16.1.95:2181]
2009-11-11 08:49:32,814 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2009-11-11 08:49:32,829 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: SyncConnected, type: None, path: null
2009-11-11 08:49:32,973 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
172.16.1.46:60000 that we are up
2009-11-11 08:49:33,509 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address
to use. Was=172.16.1.45:60020, Now=172.16.1.45
2009-11-11 08:49:34,090 INFO org.apache.hadoop.hbase.regionserver.HLog: HLog
configuration: blocksize=67108864, rollsize=63753420, enabled=true,
flushlogentries=100, optionallogflushinternal=10000ms
2009-11-11 08:49:34,281 INFO org.apache.hadoop.hbase.regionserver.HLog: New
hlog /hbase/.logs/chanel.local,60020,1257947373328/hlog.dat.1257947374090
2009-11-11 08:49:34,319 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=RegionServer,
sessionId=regionserver/172.16.1.45:60020
2009-11-11 08:49:34,350 INFO
org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
Initialized
2009-11-11 08:49:34,723 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
Opening the listener on 60030
2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 60030
webServer.getConnectors()[0].getLocalPort() returned 60030
2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 60030
2009-11-11 08:49:35,673 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder: starting
2009-11-11 08:49:35,674 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
listener on 60020: starting
2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: starting
2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: starting
2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: starting
2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: starting
2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: starting
2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: starting
2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: starting
2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: starting
2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: starting
2009-11-11 08:49:35,676 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
at: 172.16.1.45:60020
2009-11-11 08:49:35,677 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: starting
2009-11-11 08:49:35,694 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
Allocating LruBlockCache with maximum size 199.4m
2009-11-11 08:49:38,798 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
.META.,,1
2009-11-11 08:49:38,811 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
.META.,,1
2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region .META.,,1/1028785192 available; sequence id is 0

The only difference between that log file and the rest is that the others
don't have
2009-11-11 08:49:38,798 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
.META.,,1
2009-11-11 08:49:38,811 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
.META.,,1
2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region .META.,,1/1028785192 available; sequence id is 0

except M5 which has it and says id is 3....if you want me to paste all the
logs, I can...didn't think it was necessary.

hbase(main):001:0> zk_dump

HBase tree in ZooKeeper is rooted at /hbase
  Cluster up? true
  In safe mode? false
  Master address: 172.16.1.46:60000
  Region server holding ROOT: 172.16.1.96:60020
  Region servers:
    - 172.16.1.95:60020
    - 172.16.1.96:60020
    - 172.16.1.83:60020
    - 172.16.1.45:60020
hbase(main):002:0> status 'simple'
4 live servers
    crunch3.local:60020 1257928281006
        requests=0, regions=1, usedHeap=31, maxHeap=998
    chris.local:60020 1257947367197
        requests=0, regions=0, usedHeap=29, maxHeap=996
    crunch2.local:60020 1257947370011
        requests=0, regions=0, usedHeap=29, maxHeap=998
    chanel.local:60020 1257947373328
        requests=0, regions=1, usedHeap=31, maxHeap=996
0 dead servers


I started and stopped Hbase a couple times without deleting the HDFS copy,
and everything seems to work fine....was that it??? What caused the
regionservers to orphan?

On Wed, Nov 11, 2009 at 5:17 AM, Lars George <lars@worldlingo.com> wrote:

> Hi Chris,
>
> As Tatsuya says, but I would expect you have to "kill" them as they are in
> a state where a daemon stop may not work as they will not be perceptible to
> that event yet.
>
> I saw the same as Tatsuya and assumed you have an issue with your quorum
> setting on the slaves. I have not followed the whole discussion in all the
> details so let me ask you how you configured it?
>
> I have personally done a symlink of my standalone zoo.cfg from the
> zookeeper/conf to the hbase/conf directory. You could also have set the
> quorum servers in the hbase-site.xml as per the "Getting started" guide.
>
> If you have done one of those ways already then make sure all ZK servers
> are up and can be reached from the region servers.
>
> Lars
>
> Tatsuya Kawano schrieb:
>
>  Hi Chris, and thanks Lars for help.
>>
>> OK. So "jstack 22200" shows your region server is trying to finish
>> starting up, but stuck in a middle when try to get IP address of the
>> master from ZooKeeper.
>>
>> ===========================================================
>> "main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition [0xb72f2000]
>>  java.lang.Thread.State: TIMED_WAITING (sleeping)
>> at java.lang.Thread.sleep(Native Method)
>>  at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74)
>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51)
>>  at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
>>  at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
>> ===========================================================
>>
>> I still need to see the regionserver logs to figure out why this is
>> happening.
>>
>>
>> Also,
>>
>>
>>
>>> This is what I see when I run start-hbase.sh -- I can ssh into any of the
>>> boxes with no password just fine, it just gives me a weird first time
>>> host
>>> message...we get the same thing when we start up hadoop.
>>>
>>>
>> ...
>>
>>
>>> crunch2: regionserver running as process 6950. Stop it first.
>>> chanel: regionserver running as process 22200. Stop it first.
>>> crunch3: regionserver running as process 28962. Stop it first.
>>> chris: regionserver running as process 28719. Stop it first.
>>>
>>>
>>
>> This "Stop it first" message means your region servers didn't stop
>> when you ran stop-hbase.sh. The master couldn't locate those region
>> servers so it couldn't tell them to shutdown. This is why you've got
>> those orphan region servers. So until we finish setting your HBase
>> cluster up, you'll have to stop those region servers by hand.
>>
>> To do this, ssh to M2 -- M5, and type the following command:
>> ${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver
>>
>> Then jps again to make sure HRegionServer doesn't exist. If the above
>> command doesn't work, you can use Unix "kill" command. Then ssh to M1,
>> run stop-hbase.sh to stop the master and ZooKeepers.
>>
>>
>> It's still a mystery you don't have regionserver logs while you have
>> zookeeper logs. Maybe those orphan region servers was the reason?  I
>> don't know, but you can give it another try after stopping them. So,
>> try to stop whole HBase / ZooKeeper process by above way, then run
>> start-hbase.sh once again. If you can get the regionserver log to us,
>> that would be great.
>>
>> Thanks,
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message