hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: Trying to get the region servers working....
Date Thu, 03 Jun 2010 16:49:39 GMT
Just to be clear, you are not actually running exactly 2 ZK nodes are you?  I think one ZK
node on your master is sufficient for this size of cluster.  If that node goes down you entire
cluster is gone in any case.  And remember, you need to have an odd number of ZK nodes.  And
3 nodes probably doesn't make sense either -- if you have a large enough cluster to need a
ZK quorum, then you probably want to have the ability to take one node offline and have the
cluster work with an additional failure.
Dave


From: Anthony Ikeda [mailto:Anthony.Ikeda@cardlink.com.au]
Sent: Wednesday, June 02, 2010 5:38 PM
To: user@hbase.apache.org
Subject: Trying to get the region servers working....

I've successfully got hadoop installed and running:
Server1 (172.28.1.138) - master, namenode,  jobtracker, tasktracker
Server2 (172.28.1.139) - slave, datanode
Server3 (172.28.2.136) - slave, datanode
Server4 (172.28.2.137) - Slave, datanode

I'm now trying to get HBase up and running with the HBase managing ZooKeeper.

My HBase setup is:
Server1 - master, zookeeper1
Server2 - slave, regionserver
Server3 - slave, regionserver, zookeeper2
Server4 - Slave, regionserver

However the region servers seem to keep resolving the master server to 127.0.0.1:60000

This is the log entry (${HBASE_HOME}/logs/ hbase-hbase-regionserver-SVRH127.log):
2010-06-03 09:57:52,394 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2010-06-03 09:57:52,432 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
event, state: SyncConnected, type: None, path: null
2010-06-03 09:57:52,433 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher
on master address ZNode /hbase/master
2010-06-03 09:57:52,485 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/master got 127.0.0.1:60000
2010-06-03 09:57:52,486 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master
at 127.0.0.1:60000 that we are up
2010-06-03 09:58:52,914 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to
connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)


>From what I can tell in the ZooKeeper logs, it has started successfully and is communicating.
${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH124.log
2010-06-03 10:05:47,286 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server
2010-06-03 10:05:47,288 INFO org.apache.zookeeper.server.quorum.Follower: Following /172.28.2.136:2888
2010-06-03 10:05:47,290 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Sending
new notification.
2010-06-03 10:05:47,321 INFO org.apache.zookeeper.server.quorum.Follower: Getting a snapshot
from leader
2010-06-03 10:05:47,335 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting:
200000000
2010-06-03 10:06:07,272 WARN org.apache.zookeeper.server.quorum.Follower: Got zxid 0x200000001
expected 0x1
Thu Jun  3 10:14:09 EST 2010 Stopping zookeeper
Thu Jun  3 10:14:09 EST 2010 Killing zookeeper

And ${HBASE_HOME/logs/ hbase-hbase-zookeeper-SVRH127.log
2010-06-03 10:05:48,008 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server
2010-06-03 10:05:48,015 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Sending
new notification.
2010-06-03 10:05:48,016 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot
/home/hbase/zkeeper/data/version-2/snapshot.0
2010-06-03 10:05:48,020 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting:
10000000b
2010-06-03 10:05:48,041 INFO org.apache.zookeeper.server.quorum.FollowerHandler: Follower
sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@6f878144
2010-06-03 10:05:48,041 WARN org.apache.zookeeper.server.quorum.FollowerHandler: Sending snapshot
last zxid of peer is 0x10000000b  zxid of leader is 0x200000000
2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: Commiting zxid 0x200000000
from /172.28.2.136:2888 not first!
2010-06-03 10:05:48,048 WARN org.apache.zookeeper.server.quorum.Leader: First is 0
2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /172.28.1.138:23600
lastZxid 0
2010-06-03 10:06:07,992 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session
0x228fb20c3760000
2010-06-03 10:06:08,010 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x228fb20c3760000
valid:true
2010-06-03 10:06:30,002 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session
0x128fb1975310000
2010-06-03 10:06:30,003 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session
0x128fb1975310000
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x128fb1975310000
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session
0x128fb1975310003
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session
0x128fb1975310003
2010-06-03 10:06:30,004 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x128fb1975310003
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session
0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session
0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x128fb1975310001
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session
0x128fb1975310002
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session
0x128fb1975310002
2010-06-03 10:06:30,005 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x128fb1975310002
2010-06-03 10:09:59,904 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x228fb20c3760000
2010-06-03 10:09:59,906 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x228fb20c3760000
NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/172.28.2.136:2181 remote=/172.28.1.138:23600]
Thu Jun  3 10:14:09 EST 2010 Stopping zookeeper
Thu Jun  3 10:14:09 EST 2010 Killing zookeeper


The hbase-site.xml for each server is configured as:
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://172.28.1.138/hbase</value>
        </property>
        <property>
                <name>hbase.master</name>
                <value>172.28.1.138:60000</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>172.28.1.138,172.28.2.136</value>
        </property>
</configuration>

My ${HBASE_HOME}/conf/regionservers files are:
Server1 (172.28.1.138):
172.28.2.136
172.28.2.137
172.28.1.139

Server2 (172.28.1.139):
172.28.2.136
172.28.2.137
172.28.1.139

Server3 (172.28.2.136):
172.28.2.136
172.28.2.137
172.28.1.139

Server4 (172.28.2.137):
172.28.2.136
172.28.2.137
172.28.1.139

Question:
Why can't the region servers contact the master? I've checked the /etc/hosts file and there
are 2 entries to resolve the server name (127.0.0.1 and 172.28.x.x) with 127.0.0.1 coming
first. But I've been told not to change this as it affects other functions of the server.

Anthony Ikeda
Java Analyst/Programmer
Cardlink Services Limited
Level 4, 3 Rider Boulevard
Rhodes NSW 2138

Web: www.*cardlink.com.au<http://*www.*cardlink.com.au> | Tel: + 61 2 9646 9221 | Fax:
+ 61 2 9646 9283
[cid:image001.gif@01CB0306.75C39470]


**********************************************************************
This e-mail message and any attachments are intended only for the use of the addressee(s)
named above and may contain information that is privileged and confidential. If you are not
the intended recipient, any display, dissemination, distribution, or copying is strictly prohibited.
If you believe you have received this e-mail message in error, please immediately notify the
sender by replying to this e-mail message or by telephone to (02) 9646 9222. Please delete
the email and any attachments and do not retain the email or any attachments in any form.
**********************************************************************

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message