accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3148) TabletServer didn't get Session expired in HalfDeadTServerIT
Date Fri, 19 Sep 2014 15:20:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140724#comment-14140724
] 

Eric Newton commented on ACCUMULO-3148:
---------------------------------------

You should see "unable to get tablet server status" 3x, and then it will ask the tserver to
halt.
You should see "attempting to stop ".
After that is attempted, the lock is removed.
It should say "Removing zookeeper lock for " before it does so.

This test never fails for me.  What's the environment?


> TabletServer didn't get Session expired in HalfDeadTServerIT
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3148
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3148
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.1, 1.7.0
>
>
> Beening seeing spurious failures with HalfDeadTServerIT where it doesn't get the ZK session
expiration
> {noformat}
> 2014-09-15 09:39:59,201 [tserver.TabletServer] DEBUG: ScanSess tid 172.31.33.94:35957
!0 0 entries in 0.07 secs, nbTimes = [63 63 63.00 1] 
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> sleeping
> 2014-09-15 09:40:20,088 [tserver.TabletServer] FATAL: Lost tablet server lock (reason
= LOCK_DELETED), exiting.
> 2014-09-15 09:40:20,088 [zookeeper.ZooCache] WARN : Zookeeper error, will retry
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /accumulo/d0b9b8e7-9869-4b00-9ae7-317f5231f2c1/tables/1/conf/table.iterator.minc.vers.opt.maxVersions
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:261)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:153)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:277)
> 	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:224)
> 	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:114)
> 	at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.getProperties(ZooCachePropertyAccessor.java:144)
> 	at org.apache.accumulo.server.conf.TableConfiguration.getProperties(TableConfiguration.java:108)
> 	at org.apache.accumulo.core.conf.AccumuloConfiguration.iterator(AccumuloConfiguration.java:69)
> 	at org.apache.accumulo.core.conf.ConfigSanityCheck.validate(ConfigSanityCheck.java:40)
> 	at org.apache.accumulo.server.conf.ServerConfigurationFactory.getTableConfiguration(ServerConfigurationFactory.java:155)
> 	at org.apache.accumulo.server.conf.ServerConfiguration.getTableConfiguration(ServerConfiguration.java:69)
> 	at org.apache.accumulo.tserver.TabletServer.getTableConfiguration(TabletServer.java:3983)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1277)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1256)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1112)
> 	at org.apache.accumulo.tserver.Tablet.<init>(Tablet.java:1089)
> 	at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2935)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at java.lang.Thread.run(Thread.java:745)
> 2014-09-15 09:40:20,090 [tserver.TabletServer] WARN : Check for long GC pauses not called
in a timely fashion. Expected every 5.0 seconds but was 16.3 seconds since last check
> 2014-09-15 09:40:20,477 [datanode.DataNode] ERROR: 127.0.0.1:57185:DataXceiver error
processing WRITE_BLOCK operation  src: /127.0.0.1:42146 dst: /127.0.0.1:57185
> java.io.IOException: Premature EOF from inputStream
> 	at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:771)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:718)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It looks like the tserver killed itself after the connection loss but before the tserver
retried to connect and got the session expiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message