hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bates <christopher.andrew.ba...@gmail.com>
Subject Re: HBase 0.20.1 Distributed Install Problems
Date Wed, 11 Nov 2009 17:57:46 GMT
Ok, thats what I was wondering. I started reading through the Hbase chapter
in OReilly's Hadoop book.

All -- thank you for your help!  I hope to do a short write up in a blog I
just started.  Hopefully this discussion will be helpful to other new users.

On Wed, Nov 11, 2009 at 12:41 PM, Lars George <lars@worldlingo.com> wrote:

> Hi Chris,
>
> Please note that only one server gets to host the .META. table and another
> the -ROOT- table. That is it. All user regions are then equally distributed
> across all region servers.
>
> So it sounds like you are all set. Not sure what happened initially with
> the Zookeeper hang up. Keep an eye on it.
>
> Lars
>
>
> On Nov 11, 2009, at 15:12, Chris Bates <christopher.andrew.bates@gmail.com>
> wrote:
>
>  @Lars -- We're configured like in the "Getting Started" guide -- tried to
>> follow the instructions verbatim.
>>
>> Ok. Thanks for the tip. I'm not sure what causes the orphan behavior, but
>> thats good to know.  I'll also know what to look for in the jstack.  After
>> stopping the region servers individually, then doing a stop / restart, I
>> get
>> regionserver logs
>>
>> Here is M2 (chanel)
>> Wed Nov 11 08:49:30 EST 2009 Starting regionserver on chanel
>> ulimit -n 1024
>> 2009-11-11 08:49:32,081 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError,
>> -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.lo
>> 2009-11-11 08:49:32,229 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: My address is
>> chanel.local:60020
>> 2009-11-11 08:49:32,418 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
>> Initializing RPC Metrics with hostName=HRegionServer, port=60020
>> 2009-11-11 08:49:32,629 INFO
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
>> globalMemStoreLimit=398.7m, globalMemStoreLimitLowMark=249.2m,
>> maxHeap=996.8m
>> 2009-11-11 08:49:32,641 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.2.1-808558, built on 08/27/2009 18:48 GMT
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:host.name=chanel.local
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_16
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.vendor=Sun Microsystems Inc.
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.16/jre
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>>
>> environment:java.class.path=/opt/hadoop/hbase-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/opt/hadoop/hbase-0.20.1/bin/..:
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>>
>> environment:java.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/tmp
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.arch=i386
>> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.24-24-generic
>> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.name=hadoop
>> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.home=/home/hadoop
>> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.dir=/opt/hadoop/hbase-0.20.1
>> 2009-11-11 08:49:32,760 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection,
>>
>> connectString=crunch2:2181,chanel2:2181,chanel:2181,chris:2181,crunch3:2181
>> sessionTimeout=60000 watcher=org.apa
>> 2009-11-11 08:49:32,781 INFO org.apache.zookeeper.ClientCnxn:
>> zookeeper.disableAutoWatchReset is false
>> 2009-11-11 08:49:32,800 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server crunch2/172.16.1.95:2181
>> 2009-11-11 08:49:32,802 INFO org.apache.zookeeper.ClientCnxn: Priming
>> connection to java.nio.channels.SocketChannel[connected local=/
>> 172.16.1.45:33873 remote=crunch2/172.16.1.95:2181]
>> 2009-11-11 08:49:32,814 INFO org.apache.zookeeper.ClientCnxn: Server
>> connection successful
>> 2009-11-11 08:49:32,829 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
>> state: SyncConnected, type: None, path: null
>> 2009-11-11 08:49:32,973 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
>> 172.16.1.46:60000 that we are up
>> 2009-11-11 08:49:33,509 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us
>> address
>> to use. Was=172.16.1.45:60020, Now=172.16.1.45
>> 2009-11-11 08:49:34,090 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> HLog
>> configuration: blocksize=67108864, rollsize=63753420, enabled=true,
>> flushlogentries=100, optionallogflushinternal=10000ms
>> 2009-11-11 08:49:34,281 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> New
>> hlog /hbase/.logs/chanel.local,60020,1257947373328/hlog.dat.1257947374090
>> 2009-11-11 08:49:34,319 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=RegionServer,
>> sessionId=regionserver/172.16.1.45:60020
>> 2009-11-11 08:49:34,350 INFO
>> org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
>> Initialized
>> 2009-11-11 08:49:34,723 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open() is
>> -1.
>> Opening the listener on 60030
>> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 60030
>> webServer.getConnectors()[0].getLocalPort() returned 60030
>> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer: Jetty
>> bound
>> to port 60030
>> 2009-11-11 08:49:35,673 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> Responder: starting
>> 2009-11-11 08:49:35,674 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> listener on 60020: starting
>> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 0 on 60020: starting
>> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 1 on 60020: starting
>> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 2 on 60020: starting
>> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 3 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 4 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 5 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 6 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 7 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 8 on 60020: starting
>> 2009-11-11 08:49:35,676 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
>> at: 172.16.1.45:60020
>> 2009-11-11 08:49:35,677 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 9 on 60020: starting
>> 2009-11-11 08:49:35,694 INFO
>> org.apache.hadoop.hbase.regionserver.StoreFile:
>> Allocating LruBlockCache with maximum size 199.4m
>> 2009-11-11 08:49:38,798 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> .META.,,1
>> 2009-11-11 08:49:38,811 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> .META.,,1
>> 2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> region .META.,,1/1028785192 available; sequence id is 0
>>
>> The only difference between that log file and the rest is that the others
>> don't have
>> 2009-11-11 08:49:38,798 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> .META.,,1
>> 2009-11-11 08:49:38,811 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> .META.,,1
>> 2009-11-11 08:49:38,932 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> region .META.,,1/1028785192 available; sequence id is 0
>>
>> except M5 which has it and says id is 3....if you want me to paste all the
>> logs, I can...didn't think it was necessary.
>>
>> hbase(main):001:0> zk_dump
>>
>> HBase tree in ZooKeeper is rooted at /hbase
>>  Cluster up? true
>>  In safe mode? false
>>  Master address: 172.16.1.46:60000
>>  Region server holding ROOT: 172.16.1.96:60020
>>  Region servers:
>>   - 172.16.1.95:60020
>>   - 172.16.1.96:60020
>>   - 172.16.1.83:60020
>>   - 172.16.1.45:60020
>> hbase(main):002:0> status 'simple'
>> 4 live servers
>>   crunch3.local:60020 1257928281006
>>       requests=0, regions=1, usedHeap=31, maxHeap=998
>>   chris.local:60020 1257947367197
>>       requests=0, regions=0, usedHeap=29, maxHeap=996
>>   crunch2.local:60020 1257947370011
>>       requests=0, regions=0, usedHeap=29, maxHeap=998
>>   chanel.local:60020 1257947373328
>>       requests=0, regions=1, usedHeap=31, maxHeap=996
>> 0 dead servers
>>
>>
>> I started and stopped Hbase a couple times without deleting the HDFS copy,
>> and everything seems to work fine....was that it??? What caused the
>> regionservers to orphan?
>>
>> On Wed, Nov 11, 2009 at 5:17 AM, Lars George <lars@worldlingo.com> wrote:
>>
>>  Hi Chris,
>>>
>>> As Tatsuya says, but I would expect you have to "kill" them as they are
>>> in
>>> a state where a daemon stop may not work as they will not be perceptible
>>> to
>>> that event yet.
>>>
>>> I saw the same as Tatsuya and assumed you have an issue with your quorum
>>> setting on the slaves. I have not followed the whole discussion in all
>>> the
>>> details so let me ask you how you configured it?
>>>
>>> I have personally done a symlink of my standalone zoo.cfg from the
>>> zookeeper/conf to the hbase/conf directory. You could also have set the
>>> quorum servers in the hbase-site.xml as per the "Getting started" guide.
>>>
>>> If you have done one of those ways already then make sure all ZK servers
>>> are up and can be reached from the region servers.
>>>
>>> Lars
>>>
>>> Tatsuya Kawano schrieb:
>>>
>>> Hi Chris, and thanks Lars for help.
>>>
>>>>
>>>> OK. So "jstack 22200" shows your region server is trying to finish
>>>> starting up, but stuck in a middle when try to get IP address of the
>>>> master from ZooKeeper.
>>>>
>>>> ===========================================================
>>>> "main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition
>>>> [0xb72f2000]
>>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>>> at java.lang.Thread.sleep(Native Method)
>>>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74)
>>>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
>>>> at
>>>>
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
>>>> ===========================================================
>>>>
>>>> I still need to see the regionserver logs to figure out why this is
>>>> happening.
>>>>
>>>>
>>>> Also,
>>>>
>>>>
>>>>
>>>>  This is what I see when I run start-hbase.sh -- I can ssh into any of
>>>>> the
>>>>> boxes with no password just fine, it just gives me a weird first time
>>>>> host
>>>>> message...we get the same thing when we start up hadoop.
>>>>>
>>>>>
>>>>>  ...
>>>>
>>>>
>>>>  crunch2: regionserver running as process 6950. Stop it first.
>>>>> chanel: regionserver running as process 22200. Stop it first.
>>>>> crunch3: regionserver running as process 28962. Stop it first.
>>>>> chris: regionserver running as process 28719. Stop it first.
>>>>>
>>>>>
>>>>>
>>>> This "Stop it first" message means your region servers didn't stop
>>>> when you ran stop-hbase.sh. The master couldn't locate those region
>>>> servers so it couldn't tell them to shutdown. This is why you've got
>>>> those orphan region servers. So until we finish setting your HBase
>>>> cluster up, you'll have to stop those region servers by hand.
>>>>
>>>> To do this, ssh to M2 -- M5, and type the following command:
>>>> ${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver
>>>>
>>>> Then jps again to make sure HRegionServer doesn't exist. If the above
>>>> command doesn't work, you can use Unix "kill" command. Then ssh to M1,
>>>> run stop-hbase.sh to stop the master and ZooKeepers.
>>>>
>>>>
>>>> It's still a mystery you don't have regionserver logs while you have
>>>> zookeeper logs. Maybe those orphan region servers was the reason?  I
>>>> don't know, but you can give it another try after stopping them. So,
>>>> try to stop whole HBase / ZooKeeper process by above way, then run
>>>> start-hbase.sh once again. If you can get the regionserver log to us,
>>>> that would be great.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message