hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bates <christopher.andrew.ba...@gmail.com>
Subject Re: HBase 0.20.1 Distributed Install Problems
Date Wed, 11 Nov 2009 03:47:07 GMT
Thanks everyone for your help.  We discovered a couple things:

1) Our Master Node was not in the ZK quorum.
2) Our hosts file was such that the regionservers were pinging against
themselves, so we removed this line from our hosts file and made it so they
had to go to the DNS to resolve their identity.  This is still a little
unclear to me as one of my co-workers fixed this issue.

We had some other problems, probably do to us messing with the configuration
files so many times.  So I removed Hbase from all the boxes.  Then I
followed these instructions
http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_descriptionas
stack had suggested.  I then scp'd everything over to the other
boxes...so ssh was working without password.

The UI works.  I was able to run "list" and "create" at the command shell.
 One weird thing though is this is my output from zk_dump:
HBase tree in ZooKeeper is rooted at /hbase
  Cluster up? true
  In safe mode? false
  Master address: 172.16.1.46:60000
  Region server holding ROOT: 172.16.1.46:60020
  Region servers:
    - 172.16.1.46:60020

Which says I only have 1 region server.  When I check the master UI it says
there are 5 servers in the quorum--but only 1 regionserver.  All the
regionservers are supposed to be on post 2181 like in the Wiki---if I change
it to 2222 as someone had mentioned---nothing works.  I also have the same
regionservers file in the conf directories that have 5 servers.  When I
check regionserver UI log on 60030 it says this:

2009-11-10 22:37:31,683 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2009-11-10 22:37:31,708 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
event, state: SyncConnected, type: None, path: null
2009-11-10 22:37:31,860 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
172.16.1.46:60000 that we are up
2009-11-10 22:38:03,070 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us
address to use. Was=172.16.1.46:60020, Now=172.16.1.46
2009-11-10 22:38:03,505 INFO
org.apache.hadoop.hbase.regionserver.HLog: HLog configuration:
blocksize=67108864, rollsize=63753420, enabled=true,
flushlogentries=100, optionallogflushinternal=10000ms
2009-11-10 22:38:03,727 INFO
org.apache.hadoop.hbase.regionserver.HLog: New hlog
/hbase/.logs/chanel2.local,60020,1257910682720/hlog.dat.1257910683505
2009-11-10 22:38:03,759 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=RegionServer,
sessionId=regionserver/172.16.1.46:60020
2009-11-10 22:38:03,769 INFO
org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
Initialized
2009-11-10 22:38:04,143 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open()
is -1. Opening the listener on 60030
2009-11-10 22:38:04,144 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 60030
webServer.getConnectors()[0].getLocalPort() returned 60030
2009-11-10 22:38:04,145 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 60030
2009-11-10 22:39:12,514 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server Responder: starting
2009-11-10 22:39:12,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server listener on 60020: starting
2009-11-10 22:39:12,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 0 on 60020: starting
2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 1 on 60020: starting
2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 2 on 60020: starting
2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 3 on 60020: starting
2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 4 on 60020: starting
2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 5 on 60020: starting
2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 6 on 60020: starting
2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 7 on 60020: starting
2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 8 on 60020: starting
2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 9 on 60020: starting
2009-11-10 22:39:12,520 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer
started at: 172.16.1.46:60020
2009-11-10 22:39:12,532 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Allocating
LruBlockCache with maximum size 199.7m
2009-11-10 22:39:12,587 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
-ROOT-,,0
2009-11-10 22:39:12,595 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_OPEN: -ROOT-,,0
2009-11-10 22:39:12,725 INFO
org.apache.hadoop.hbase.regionserver.HRegion: region
-ROOT-,,0/70236052 available; sequence id is 3
2009-11-10 22:39:18,700 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
.META.,,1
2009-11-10 22:39:18,706 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_OPEN: .META.,,1
2009-11-10 22:39:18,729 INFO
org.apache.hadoop.hbase.regionserver.HRegion: region
.META.,,1/1028785192 available; sequence id is 0



Another thing I don't understand.  If I start and stop hbase, I get this
error when I check the Master UI if I don't first delete the old HBase copy
in HDFS

HTTP ERROR: 500

Trying to contact region server null for region , row '', but failed
after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region

RequestURI=/master.jsp
Caused by:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server null for region , row '', but failed after 3
attempts.
Exceptions:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
trying to locate root region

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:432)
	at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
	at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)



On Mon, Nov 9, 2009 at 11:47 PM, Tatsuya Kawano <tatsuyaml@snowcocoa.info>wrote:

> Hello,
>
> It looks like the master and the region servers are cannot locate each
> other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster
> members, so maybe your zk has wrong information.
>
> Can you type zk_dump from hbase shell and let us the result?
>
> If the cluster is properly configured, you'll get something like this:
> =====================================
> hbase(main):007:0> zk_dump
>
> HBase tree in ZooKeeper is rooted at /hbase
>  Cluster up? true
>  In safe mode? false
>  Master address: 172.16.80.26:60000
>  Region server holding ROOT: 172.16.80.27:60020
>  Region servers:
>   - 172.16.80.27:60020
>   - 172.16.80.29:60020
>   - 172.16.80.28:60020
> =====================================
>
>
> > one of my co-workers apparently can log into his box and submit jobs, but
> > me or anyone else is still unable to log in.
>
> Maybe you're a bit confused; your co-worker seems to be able to use
> Hadoop Map/Reduce, not HBase.
>
>
> > Does Hbase allow concurrent connections?
>
> Yes.
>
>
> >> I think it also says the master is on port 60000
> >> when the install directions say its supposed to be 60010?
>
> Port 60000 is correct. The master uses port 60000 to accept connection
> from hbase shell and region servers. Port 60010 is for the web-based
> HBase console.
>
>
> > We tried applying this fix (to explicitly set the master):
> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>
> No, this is an old way to configure a cluster. You shouldn't use this
> with HBase 0.20.x
>
>
> Thanks,
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
>
>
> On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates
> <christopher.andrew.bates@gmail.com> wrote:
> > Another interesting data point.  We tried applying this fix (to
> explicitly
> > set the master):
> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
> >
> > But when I log in to the master node, it takes really long to submit a
> query
> > and I get this in response:
> > hbase(main):001:0> list
> > NativeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Trying to contact region server null for region , row '', but failed
> after 5
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> > to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> > to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> > to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> > to locate root region
> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying
> > to locate root region
> >
> > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
> > `getRegionServerWithRetries'
> >  from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
> > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
> >  from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
> > `listTables'
> > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
> >  from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
> > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
> >  from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
> > from java/lang/reflect/Method.java:597:in `invoke'
> >  from org/jruby/javasupport/JavaMethod.java:298:in
> > `invokeWithExceptionHandling'
> > from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
> >  from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
> >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
> > from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
> >  from org/jruby/ast/ForNode.java:104:in `interpret'
> > ... 116 levels...
> > from
> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in
> > `call'
> >  from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in `call'
> > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in `call'
> >  from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in `call'
> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
> `cacheAndCall'
> >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
> > from
> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in
> > `__file__'
> >  from
> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in
> > `load'
> > from org/jruby/Ruby.java:577:in `runScript'
> >  from org/jruby/Ruby.java:480:in `runNormally'
> > from org/jruby/Ruby.java:354:in `runFromMain'
> >  from org/jruby/Main.java:229:in `run'
> > from org/jruby/Main.java:110:in `run'
> >  from org/jruby/Main.java:94:in `main'
> > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list'
> >  from (hbase):2hbase(main):002:0>
> >
> >
> > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates <
> > christopher.andrew.bates@gmail.com> wrote:
> >
> >> thanks for your response Sujee.  These boxes are all on an internal DNS
> and
> >> they all resolve.
> >>
> >> one of my co-workers apparently can log into his box and submit jobs,
> but
> >> me or anyone else is still unable to log in.  Does Hbase allow
> concurrent
> >> connections?  In Hive I remember having to configure the metastore to be
> in
> >> server mode if multiple people were using it.
> >>
> >>
> >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <sujee@sujee.net> wrote:
> >>
> >>> > [hadoop@crunch hbase-0.20.1]$ bin/start-hbase.sh
> >>> >
> >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of
> known
> >>> > hosts.
> >>>
> >>>
> >>> is your SSH setup correctly?  From master, you need to be able to
> >>> login to all slaves/regionservers without password
> >>>
> >>> And I see you are using short hostnames (crunch2, crunch3), do they
> >>> all resolve correctly?  or you need to update /etc/hosts to resolve
> >>> these to an IP address on all machines.
> >>>
> >>> regards
> >>> Sujee Maniyam
> >>> --
> >>> http://sujee.net
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message