hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Soldatov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
Date Tue, 16 Jan 2018 23:28:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327990#comment-16327990
] 

Sergey Soldatov edited comment on HBASE-19805 at 1/16/18 11:27 PM:
-------------------------------------------------------------------

well, here is a short RCA:
checkSplittable() method relies on Region.isSplittable which is just a simple check that region
is available (not closing nor not closed) and has no references. But HRegion.closing flag
we set only when we actually execute doClose().  At first glance, it would be reasonable to
add a check that the region state (RegionStateNode) is not CLOSING to checkSplittable(). 


was (Author: sergey.soldatov):
well, here is a short RCA:
checkSplittable method relies on Region.isSplittable which is just a simple check that region
is available (not closing nor not closed) and has no references. But HRegion.closing flag
we set only when we actually execute doClose().  At first glance, it would be reasonable to
add a check that the region state (RegionStateNode) is not CLOSING. 

> NPE in HMaster while issuing a sequence of table splits
> -------------------------------------------------------
>
>                 Key: HBASE-19805
>                 URL: https://issues.apache.org/jira/browse/HBASE-19805
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Josh Elser
>            Assignee: Sergey Soldatov
>            Priority: Critical
>             Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the first few
region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000]
master.HMaster: Client=jelser//192.168.1.23 split myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000]
ipc.RpcServer: Unexpected throwable object
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
> 	at org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
> 	at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
> 	at org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
> 	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
> 	at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.<init>(SplitTableRegionProcedure.java:103)
> 	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
> 	at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
> 	at org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
> 	at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
> 	at org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
> 	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
>     Configuration conf = HBaseConfiguration.create();
>     try (Connection conn = ConnectionFactory.createConnection(conf);
>         Admin admin = conn.getAdmin()) {
>       final TableName tn = TableName.valueOf("myTestTable");
>       if (admin.tableExists(tn)) {
>         admin.disableTable(tn);
>         admin.deleteTable(tn);
>       }
>       final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>           .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>           .build();
>       admin.createTable(desc);
>       List<String> splitPoints = new ArrayList<>(16);
>       for (int i = 1; i <= 16; i++) {
>         splitPoints.add(Integer.toString(i, 16));
>       }
>       
>       System.out.println("Splits: " + splitPoints);
>       int numRegions = admin.getRegions(tn).size();
>       for (String splitPoint : splitPoints) {
>         System.out.println("Splitting on " + splitPoint);
>         admin.split(tn, Bytes.toBytes(splitPoint));
>         Thread.sleep(200);
>         int newRegionSize = admin.getRegions(tn).size();
>         while (numRegions == newRegionSize) {
>           Thread.sleep(50);
>           newRegionSize = admin.getRegions(tn).size();
>         }
>       }
> {code}
> A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame.
> {code}
>   static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv env,
>       final ServerName regionLocation, final RegionInfo hri, boolean includeBestSplitRow)
>   throws IOException {
>     // TODO: There is no timeout on this controller. Set one!
>     HBaseRpcController controller = env.getMasterServices().getClusterConnection().
>         getRpcControllerFactory().newController();
>     final AdminService.BlockingInterface admin =
>         env.getMasterServices().getClusterConnection().getAdmin(regionLocation);
> {code}
> We don't validate that we have a non-null {{ServerName regionLocation}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message