hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
Date Sat, 31 Oct 2015 00:17:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983663#comment-14983663
] 

Enis Soztutar commented on HBASE-6721:
--------------------------------------

Finally got around testing the v15 patch on 1.1 code base with a 7 node cluster. Here are
my test notes. Nothing too concerning, but we have to address some of these in the patch.
This is the configuration to add to enable groups: 
{code}
    <property>
      <name>hbase.coprocessor.master.classes</name>
      <value>org.apache.hadoop.hbase.group.GroupAdminEndpoint</value>
    </property>
    <property>
      <name>hbase.master.loadbalancer.class</name>
      <value>org.apache.hadoop.hbase.group.GroupBasedLoadBalancer</value>
    </property>
{code}


1. Need to add this diff, so that new PB files get compiled with -Pcompile-protobuf command:

{code}
diff --git hbase-protocol/pom.xml hbase-protocol/pom.xml
index 8034576..d352373 100644
--- hbase-protocol/pom.xml
+++ hbase-protocol/pom.xml
@@ -180,6 +180,8 @@
                           <include>ErrorHandling.proto</include>
                           <include>Filter.proto</include>
                           <include>FS.proto</include>
+                          <include>Group.proto</include>
+                          <include>GroupAdmin.proto</include>
                           <include>HBase.proto</include>
                           <include>HFile.proto</include>
                           <include>LoadBalancer.proto</include>
{code}

2. NPE in group shell commands with nonexisting groups: 
{code}
hbase(main):015:0* balance_group 'nonexisting' 

ERROR: java.io.IOException
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hbase.group.GroupAdminServer.groupGetRegionsInTransition(GroupAdminServer.java:412)
	at org.apache.hadoop.hbase.group.GroupAdminServer.balanceGroup(GroupAdminServer.java:348)
	at org.apache.hadoop.hbase.group.GroupAdminEndpoint.balanceGroup(GroupAdminEndpoint.java:229)
	at org.apache.hadoop.hbase.protobuf.generated.GroupAdminProtos$GroupAdminService.callMethod(GroupAdminProtos.java:11156)
	at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:666)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51121)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
{code}

and 

{code}
hbase(main):030:0> get_group 'nonexisting'
GROUP INFORMATION                                                                        
                                                                                         
                                                                               
Servers:                                                                                 
                                                                                         
                                                                               

ERROR: undefined method `getServers' for nil:NilClass

Here is some help for this command:
Get a region server group's information.

Example:

  hbase> get_group 'default'
{code}

and 

{code}
hbase(main):077:0* move_group_tables 'nonexisting'

ERROR: undefined method `each' for nil:NilClass

Here is some help for this command:
Reassign tables from one group to another.

  hbase> move_group_tables 'dest',['table1','table2']
{code}

and 
{code}
hbase(main):173:0* move_group_servers 'nonexisting'

ERROR: undefined method `each' for nil:NilClass

Here is some help for this command:
Reassign a region server from one group to another.

  hbase> move_group_servers 'dest',['server1:port','server2:port']
{code}

3. Group names should be restricted to alphanumeric only. This one is pretty easy, but important.
This following caused the master to abort, and the master cannot restart after this point
(without manually removing the rsgroup entry from the table which you cannot do without master).
I had to nuke the hdfs and zk to start over. 
{code}
hbase(main):033:0> add_group 'a-/:*'

ERROR: java.io.IOException: Failed to write to groupZNode
	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.flushConfig(GroupInfoManagerImpl.java:419)
	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.addGroup(GroupInfoManagerImpl.java:152)
	at org.apache.hadoop.hbase.group.GroupAdminServer.addGroup(GroupAdminServer.java:298)
	at org.apache.hadoop.hbase.group.GroupAdminEndpoint.addGroup(GroupAdminEndpoint.java:197)
	at org.apache.hadoop.hbase.protobuf.generated.GroupAdminProtos$GroupAdminService.callMethod(GroupAdminProtos.java:11146)
	at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:666)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:51121)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
for /hbase-unsecure/groupInfo/a-/:*
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:575)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:554)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1261)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1250)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:1233)
	at org.apache.hadoop.hbase.group.GroupInfoManagerImpl.flushConfig(GroupInfoManagerImpl.java:408)
{code}
 
4. {{get_table_group}} and {{get_server_group}} shell commands do not work
{code}
hbase(main):019:0* get_table_group 'nonexisting'

ERROR: undefined local variable or method `s' for #<Hbase::GroupAdmin:0x64518270>

Here is some help for this command:
Get the group name the given table is a member of.

  hbase> get_table_group 'myTable'

 
hbase(main):022:0* get_server_group 'server'

ERROR: undefined local variable or method `s' for #<Hbase::GroupAdmin:0x64518270>

Here is some help for this command:
Get the group name the given region server is a member of.

  hbase> get_server_group 'server1:port1
{code}

5. {{move_group_servers}} and {{move_group_tables}} arguments are listed as 1, although should
be 2: 
{code}
hbase(main):033:0* move_group_servers 

ERROR: wrong number of arguments (0 for 1)

Here is some help for this command:
Reassign a region server from one group to another.

  hbase> move_group_servers 'dest',['server1:port','server2:port']
{code}

6. Adding a server without port throws error, but no explanation (this one is a minor, not
that important). 
{code}
hbase(main):070:0> move_group_servers 'group2', ['os-enis-hbase-oct27-a-3.novalocal'] 


ERROR: 

Here is some help for this command:
Reassign a region server from one group to another.

  hbase> move_group_servers 'dest',['server1:port','server2:port']
{code}

7. From all the above, it is clear that we need a unit test over the new shell commands. 

Other than these, the feature is working as expected. Defining groups, moving servers and
tables work. Regions get reassigned according to their groups. Restarting the cluster keeps
assignments, etc. 

Some more findings: 
Test 1: 
Killed the last regionserver of a group, all 15 regions are in FAILED_OPEN state. 
 - restarted the master, regions still in FAILED_OPEN state (which is expected)
 - Added a new server to the group which had no remaining servers, regions still in FAILED_OPEN
state (this is probably due to how assignment works, we give up after 10 retries and wait
for manual assignment or master restart)
 - Started the region server that was killed before, still in FAILED_OPEN
 - Master restart reassigned these regions. 

Test 2: 
Tried to move all servers to a single group. Correctly handles last server in the default
group by not allowing it to change. 

Test 3: 
Killed the last server in the default group, while all system tables are in the default group
(and hence in that server). 
 -> hbase:meta was always in PENDING_OPEN in bogus server localhost,1,1. 
 -> Upon restarting the killed server, meta and other tables in the default group (including
rsgroup table) got reassigned. 
 As a side note, having not enough servers in the group that has the meta or rsgroup table
seems like a very good way of shoothing yourself in the foot. However, as discussed before
this maybe needed for strong isolation. 


- Add non-existing server to the group. Is not allowed. 
- Checked JMX
- Adding group information for tables and regionserver to the master UI would be helpful.
We can leave this to a follow up. 
- Obviously there should be a follow up to add at least some basic documentation on how to
enable and configure and use RS groups in the book. 






> RegionServer Group based Assignment
> -----------------------------------
>
>                 Key: HBASE-6721
>                 URL: https://issues.apache.org/jira/browse/HBASE-6721
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>              Labels: hbase-6721
>         Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence
Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf,
HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_13.patch,
HBASE-6721_14.patch, HBASE-6721_15.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch,
HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch,
HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch,
HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch,
HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch,
balanceCluster Sequence Diagram.svg, hbase-6721-v15-branch-1.1.patch, immediateAssignments
Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg,
roundRobinAssignment Sequence Diagram.svg
>
>
> In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving
out regions from a number of different tables owned by various client applications. Being
able to group a subset of running RegionServers and assign specific tables to it, provides
a client application a level of isolation and resource allocation.
> The proposal essentially is to have an AssignmentManager which is aware of RegionServer
groups and assigns tables to region servers based on groupings. Load balancing will occur
on a per group basis as well. 
> This is essentially a simplification of the approach taken in HBASE-4120. See attached
document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message