accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4456) Verify client functionality when active master failover
Date Wed, 14 Sep 2016 23:20:21 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491731#comment-15491731
] 

Josh Elser commented on ACCUMULO-4456:
--------------------------------------

Alright, this is even trickier than I expected:

{noformat}
2016-09-14 19:16:55,274 [impl.MasterClient] DEBUG: Failed to connect to master=localhost:51105,
will retry... 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
	at org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:320)
	at org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTransport(ThriftTransportPool.java:478)
	at org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(ThriftTransportPool.java:410)
	at org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(ThriftTransportPool.java:388)
	at org.apache.accumulo.core.rpc.ThriftUtil.getClient(ThriftUtil.java:139)
	at org.apache.accumulo.core.rpc.ThriftUtil.getClientNoTimeout(ThriftUtil.java:106)
	at org.apache.accumulo.core.client.impl.MasterClient.getConnection(MasterClient.java:70)
	at org.apache.accumulo.core.client.impl.MasterClient.getConnectionWithRetry(MasterClient.java:47)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.finishFateOperation(TableOperationsImpl.java:279)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:341)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:293)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1465)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.delete(TableOperationsImpl.java:650)
	at com.github.joshelser.MasterInteraction.run(MasterInteraction.java:55)
	at com.github.joshelser.MasterInteraction.main(MasterInteraction.java:72)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
	... 15 more
2016-09-14 19:17:00,667 [zookeeper.ZooCache] WARN : Saw (possibly) transient exception communicating
with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /accumulo/332fbfc6-a17b-4104-8e7c-17f05cc485c3
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:319)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:295)
	at org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:190)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:347)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:282)
	at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:177)
	at org.apache.accumulo.core.zookeeper.ZooUtil.getRoot(ZooUtil.java:40)
	at org.apache.accumulo.core.client.ZooKeeperInstance.getMasterLocations(ZooKeeperInstance.java:188)
	at org.apache.accumulo.core.client.impl.MasterClient.getConnection(MasterClient.java:57)
	at org.apache.accumulo.core.client.impl.MasterClient.getConnectionWithRetry(MasterClient.java:47)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.finishFateOperation(TableOperationsImpl.java:279)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:341)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:293)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1465)
	at org.apache.accumulo.core.client.impl.TableOperationsImpl.delete(TableOperationsImpl.java:650)
	at com.github.joshelser.MasterInteraction.run(MasterInteraction.java:55)
	at com.github.joshelser.MasterInteraction.main(MasterInteraction.java:72)
2016-09-14 19:17:00,879 [impl.ThriftTransportPool] TRACE: Creating new connection to connection
to localhost:51435
{noformat}

Had two masters and one client creating/deleting tables.

# SIGSTOP the client
# kill the active master
# let the standby master steal the lock
# re-start the killed master
# SIGCONT the client

In this case, the client is losing its ZK session, recreating the zk object and then not hitting
the wrong master. Going to try this approach some more.

> Verify client functionality when active master failover
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4456
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4456
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: client, master
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0
>
>
> [~kturner] asked me a good question about what the client does when it tries to talk
to a master which has recently lost its active status, how does the client handle the thrown
exception?
> Should run a quick local test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message