hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
Date Wed, 16 Oct 2013 05:22:45 GMT
stack created HBASE-9779:
----------------------------

             Summary: IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify
table 
                 Key: HBASE-9779
                 URL: https://issues.apache.org/jira/browse/HBASE-9779
             Project: HBase
          Issue Type: Bug
          Components: test
    Affects Versions: 0.96.0
            Reporter: stack
            Assignee: stack


As part of the test, we want to delete the created table to restore cluster state.  Interestingly
we can disable the table successfully but then immediately after we fail the delete because
we cannot get the table descriptor -- getting the file descriptor is used to test if table
is present.

The test for getDescriptor is kinda broke because it throws base IOE which causes clients
to retry over and over again as though the descriptor was going to come back.

This bug is kinda ugly because in at least one case it caused our long-running hbase-it suite
run to fail so would be good to fix.

Here is sample from a test run:

{code}
Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  [main] client.HBaseAdmin:
Started disable of IntegrationTestLoadAndVerify
2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181
sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
2013-10-11 18:27:53,527 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
2013-10-11 18:27:53,529 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
2013-10-11 18:27:53,539 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c70, negotiated timeout = 40000
2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c70 closed
2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
.2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181
sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
2013-10-11 18:27:54,696 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c71, negotiated timeout = 40000
2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c71 closed
2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
.2013-10-11 18:27:55,890 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=a1805.halxg.cloudera.com:2181
sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
2013-10-11 18:27:55,891 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
2013-10-11 18:27:55,891 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
2013-10-11 18:27:55,891 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
2013-10-11 18:27:55,892 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
2013-10-11 18:27:55,896 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c72, negotiated timeout = 40000
2013-10-11 18:27:55,908 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
2013-10-11 18:27:55,912 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c72 closed
2013-10-11 18:27:55,912 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
 2389 ms
Deleting table IntegrationTestLoadAndVerify 
2013-10-11 18:33:02,029 INFO  [main] hbase.HBaseCluster: Restoring cluster - started
2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Added new HBaseAdmin
2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Restoring cluster - done
2013-10-11 18:33:02,033 ERROR [main] util.AbstractHBaseTool: Error running command-line tool
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Fri Oct 11 18:27:55 PDT 2013, org.apache.hadoop.hbase.client.RpcRetryingCaller@48c6f1e8, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
java.io.IOException: HTableDescriptor missing for IntegrationTestLoadAndVerify
	at org.apache.hadoop.hbase.master.handler.TableEventHandler.getTableDescriptor(TableEventHandler.java:231)
	at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.prepareWithTableLock(DeleteTableHandler.java:58)
	at org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:93)
	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1816)
	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1826)
	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38213)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
....
{code}


Looking on master logs, it looks fine:

{code}
  1 2013-10-11 18:27:55,541 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler:
Disable waiting until done; 298996 ms remaining; []
  0 2013-10-11 18:27:55,546 INFO  [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler:
Disabled table is done=true
  1 2013-10-11 18:27:55,555 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] lock.ZKInterProcessLockBase:
Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000001
  2 2013-10-11 18:27:55,923 INFO  [RpcServer.handler=14,port=60000] master.HMaster: Client=hbase//10.20.200.112
delete IntegrationTestLoadAndVerify
  3 2013-10-11 18:27:55,931 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase:
Acquired a lock for /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
  4 2013-10-11 18:27:55,954 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase:
Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
  5 2013-10-11 18:27:56,167 INFO  [RpcServer.handler=15,port=60000] master.HMaster: Client=hbase//10.20.200.112
delete IntegrationTestLoadAndVerify
...
{code}

Oddly no exceptions other than corrupted snapshot because mssing hfile.  All seems to have
been operating fine.  Will look more.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message