accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3148) TabletServer didn't get Session expired in HalfDeadTServerIT
Date Fri, 19 Sep 2014 15:38:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140748#comment-14140748
] 

Josh Elser commented on ACCUMULO-3148:
--------------------------------------

Snippet from the ZK log for more context. The session *did* expire and the tserver's lock
was also deleted. This happened about 20ms before the master saw that the tserver was "lost".

{noformat}
2014-09-15 09:40:16,000 [server.ZooKeeperServer] INFO : Expiring session 0x14878ae7b920006,
timeout of 15000ms exceeded
2014-09-15 09:40:16,001 [server.PrepRequestProcessor] INFO : Processed session termination
for sessionid: 0x14878ae7b920006
2014-09-15 09:40:16,001 [server.FinalRequestProcessor] DEBUG: Processing request:: sessionid:0x14878ae7b920006
type:closeSession cxid:0x0 zxid:0x181 txntype:-11 reqpath:n/a
2014-09-15 09:40:16,003 [server.DataTree] DEBUG: Deleting ephemeral node /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tservers/ip-172-31-33-94:40793/zlock-0000000000
for session 0x14878ae7b920006
2014-09-15 09:40:16,004 [server.NIOServerCnxn] INFO : Closed socket connection for client
/127.0.0.1:42897 which had sessionid 0x14878ae7b920006
{noformat}

> TabletServer didn't get Session expired in HalfDeadTServerIT
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3148
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.1, 1.7.0
>
>
> Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get the ZK session
expiration
> {noformat}
> 2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid 172.31.33.94:35957
!0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1] 
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> 2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock (reason
= LOCK_DELETED), exiting.
> 2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
> 	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
> 	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
> 	at org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
> 	at org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
> 	at org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
> 	at org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
> 	at org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
> 	at org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
> 	at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at java.lang.Thread.run(Thread.java:745)
> 2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC pauses not called
in a timely fashion. Expected every 5.0 seconds but was 16.3 seconds since last check
> 2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 127.0.0.1:57185:DataXceiver error
processing WRITE_BLOCK operation  src: /127.0.0.1:42146 dst: /127.0.0.1:57185
> java.io.IOException: Premature EOF from inputStream
> 	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It looks like the tserver killed itself after the connection loss but before the tserver
retried to connect and got the session expiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message