hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9779) IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table
Date Wed, 16 Oct 2013 05:33:42 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-9779:
-------------------------

    Attachment: 9779part.txt

Fix so we don't retry if table doesn't exist.

If Elliott is reading, this failed the #36, #40 and half if not all of build #38. Let me make
it critical.

> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify table 
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-9779
>                 URL: https://issues.apache.org/jira/browse/HBASE-9779
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: stack
>            Assignee: stack
>         Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster state.  Interestingly
we can disable the table successfully but then immediately after we fail the delete because
we cannot get the table descriptor -- getting the file descriptor is used to test if table
is present.
> The test for getDescriptor is kinda broke because it throws base IOE which causes clients
to retry over and over again as though the descriptor was going to come back.
> This bug is kinda ugly because in at least one case it caused our long-running hbase-it
suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO  [main] client.HBaseAdmin:
Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO  [main] zookeeper.ZooKeeper: Initiating client connection,
connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:53,539 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c70, negotiated timeout = 40000
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c70
closed
> 2013-10-11 18:27:53,662 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> .2013-10-11 18:27:54,666 INFO  [main] zookeeper.ZooKeeper: Initiating client connection,
connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:54,696 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c71, negotiated timeout = 40000
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c71
closed
> 2013-10-11 18:27:54,871 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> .2013-10-11 18:27:55,890 INFO  [main] zookeeper.ZooKeeper: Initiating client connection,
connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:55,891 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a7e666f
connecting to ZooKeeper ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:55,891 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Opening socket connection to server a1805.halxg.cloudera.com/10.20.200.105:2181. Will not
attempt to authenticate using SASL (unknown error)
> 2013-10-11 18:27:55,891 DEBUG [main] catalog.CatalogTracker: Starting catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,892 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Socket connection established to a1805.halxg.cloudera.com/10.20.200.105:2181, initiating session
> 2013-10-11 18:27:55,896 INFO  [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
Session establishment complete on server a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid
= 0x1412d47f53a5c72, negotiated timeout = 40000
> 2013-10-11 18:27:55,908 DEBUG [main] catalog.CatalogTracker: Stopping catalog tracker
org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,912 INFO  [main] zookeeper.ZooKeeper: Session: 0x1412d47f53a5c72
closed
> 2013-10-11 18:27:55,912 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
>  2389 ms
> Deleting table IntegrationTestLoadAndVerify 
> 2013-10-11 18:33:02,029 INFO  [main] hbase.HBaseCluster: Restoring cluster - started
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Added new HBaseAdmin
> 2013-10-11 18:33:02,032 INFO  [main] hbase.HBaseCluster: Restoring cluster - done
> 2013-10-11 18:33:02,033 ERROR [main] util.AbstractHBaseTool: Error running command-line
tool
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
> Fri Oct 11 18:27:55 PDT 2013, org.apache.hadoop.hbase.client.RpcRetryingCaller@48c6f1e8,
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException:
HTableDescriptor missing for IntegrationTestLoadAndVerify
> 	at org.apache.hadoop.hbase.master.handler.TableEventHandler.getTableDescriptor(TableEventHandler.java:231)
> 	at org.apache.hadoop.hbase.master.handler.DeleteTableHandler.prepareWithTableLock(DeleteTableHandler.java:58)
> 	at org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:93)
> 	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1816)
> 	at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1826)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38213)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> ....
> {code}
> Looking on master logs, it looks fine:
> {code}
>   1 2013-10-11 18:27:55,541 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler:
Disable waiting until done; 298996 ms remaining; []
>   0 2013-10-11 18:27:55,546 INFO  [MASTER_TABLE_OPERATIONS-a1805:60000-0] handler.DisableTableHandler:
Disabled table is done=true
>   1 2013-10-11 18:27:55,555 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0] lock.ZKInterProcessLockBase:
Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000001
>   2 2013-10-11 18:27:55,923 INFO  [RpcServer.handler=14,port=60000] master.HMaster: Client=hbase//10.20.200.112
delete IntegrationTestLoadAndVerify
>   3 2013-10-11 18:27:55,931 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase:
Acquired a lock for /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   4 2013-10-11 18:27:55,954 DEBUG [RpcServer.handler=14,port=60000] lock.ZKInterProcessLockBase:
Released /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
>   5 2013-10-11 18:27:56,167 INFO  [RpcServer.handler=15,port=60000] master.HMaster: Client=hbase//10.20.200.112
delete IntegrationTestLoadAndVerify
> ...
> {code}
> Oddly no exceptions other than corrupted snapshot because mssing hfile.  All seems to
have been operating fine.  Will look more.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message