hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: HBASE -- Session Expire ?
Date Wed, 04 Jul 2012 04:40:57 GMT
Jay, 

You need to modify the zoo.cfg to reflect the quorum.

server.0=localhost:2888:3888 will change to something like

server.0=zk_host_1:2888:3888
server.1=zk_host_2:2888:3888
server.3=zk_host_3:2888:3888

The same config needs to be on all the zookeeper hosts.

Also, I assume it's a self managed ZK.

Secondly, I'm seeing session timeouts between RS and ZK, which means there is something going
on because of which RS is not able to talk to ZK. This could happen due to the following reasons:

1. RS is loaded and is not able to communicate with ZK. This could be due to a GC pause as
well. Based on what you are saying, there is nothing happening on the cluster so that should
not be the case

2. The network is acting up. It is very much possible that packets are getting dropped. I
have send that happen myself and it was really hard to debug. The NoRouteToHostExceptions
hints at that. I'm seeing those in your RS logs too, although that's to do with it not being
able to talk to HDFS:

> 2012-07-03 18:47:25,161 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
172.18.0.18:50010 java.net.NoRouteToHostException: No route to host
Do you have monitoring in place? Can you get more info on whats going on on the hosts and
the network?

Also, you can collocate datanodes and region servers, which is not what you have done currently.

What's the hardware config on these boxes?

-Amandeep 


On Tuesday, July 3, 2012 at 8:16 PM, Jay Wilson wrote:

> First, thank you for looking at this for me.
> 
> Second, the network is up. It is dedicated to the cluster and it appears
> stable.
> 
> Third, I haven't modified the zoo.cfg; however, I have put it on
> pastebin. I made all my zookeeper changes in hbase-site.xml
> 
> zoo.cfg -- http://pastebin.com/download.php?i=askC9VRG
> hbase-site.xml -- http://pastebin.com/download.php?i=DkLGr57G
> 
> HMASTER LOG -- http://pastebin.com/download.php?i=i4U52cWf
> 
> ZK (devrackA-03) -- http://pastebin.com/download.php?i=CRyQFKFF
> ZK (devrackA-04) -- http://pastebin.com/download.php?i=WAqAhjdh
> ZK (devrackA-05) -- http://pastebin.com/download.php?i=cS1Gm19x
> 
> RS (devrackA-06) -- http://pastebin.com/download.php?i=XayB2HeX
> RS (devrackB-07) -- http://pastebin.com/download.php?i=RQZ45a8j
> RS (devrackB-08) -- http://pastebin.com/download.php?i=ZDZD0z7B
> 
> ---
> Jay Wilson
> 
> On 7/3/2012 5:23 PM, Amandeep Khurana wrote:
> > Can you put your zoo.cfg and hbase-site.xml on pastebin and put the links here?
Have you verified that your network is fine?
> > Also, can you put up your RS and ZK logs too?
> > 
> > 
> > 
> > On Tuesday, July 3, 2012 at 5:19 PM, Jay Wilson wrote:
> > 
> > > I have reread the sections in the O'Reilly HBase book on cluster
> > > configuration and troubleshooting and I am still getting "session
> > > expired" after X number of minutes. X being anywhere from 15 to 20 minutes.
> > > 
> > > There is 0 load on the cluster and it's using a dedicated isolated
> > > network. No jobs running just the Hadoop/Hbase java processes.
> > > 
> > > I have separated the Hadoop and HBase processes as follows:
> > > 
> > > devrackA-00 (NameNode)
> > > devrackA-01 (SecondaryNameNode)
> > > devrackA-03 (HQuorumPeer + HMaster)
> > > devrackA-04 (HQuorumPeer)
> > > devrackA-05 (HQuorumPeer)
> > > devrackA-06 (HRegionServer)
> > > devrackA-07
> > > to (DataNode)
> > > devrackA-20
> > > devrackB-00
> > > to (DataNode)
> > > devrackB-06
> > > devrackB-07 (HRegionServer)
> > > devrackB-08 (HRegionServer)
> > > devrackB-09
> > > to (DataNode)
> > > devrackB-20
> > > 
> > > I did have DataNode on my HQuorumPeers and HRegionServers, but I have
> > > excluded them and verified they are excluded:
> > > 
> > > Name: devrackA-03
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > 
> > > Name: devrackA-04
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > 
> > > Name: devrackA-05
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > Name: devrackA-06
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > 
> > > Name: devrackB-07
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > 
> > > Name: devrackB-08
> > > Decommission Status : Normal
> > > Configured Capacity: 0 (0 KB)
> > > DFS Used: 0 (0 KB)
> > > Non DFS Used: 0 (0 KB)
> > > DFS Remaining: 0(0 KB)
> > > DFS Used%: 100%
> > > DFS Remaining%: 0%
> > > Last contact: Wed Dec 31 16:00:00 PST 1969
> > > 
> > > I must be missing a fundamental setting. Thoughts?
> > > 
> > > Thank You
> > > ---
> > > Jay Wilson
> > > 
> > 
> > 
> 
> 
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message