cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6373) describe_ring hangs with hsha thrift server
Date Fri, 10 Jan 2014 19:09:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868146#comment-13868146
] 

Nick Bailey commented on CASSANDRA-6373:
----------------------------------------

Actually I do see an error in the log:

{noformat}
ERROR [main] 2014-01-10 19:06:51,179 CassandraDaemon.java (line 478) Exception encountered
during startup
java.lang.NoClassDefFoundError: com/lmax/disruptor/EventTranslator
        at com.thinkaurelius.thrift.TDisruptorServer.<init>(TDisruptorServer.java:192)
        at org.apache.cassandra.thrift.THsHaDisruptorServer.<init>(THsHaDisruptorServer.java:46)
        at org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:90)
        at org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:56)
        at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.<init>(ThriftServer.java:130)
        at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:56)
        at org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:414)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:474)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
Caused by: java.lang.ClassNotFoundException: com.lmax.disruptor.EventTranslator
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 9 more
 INFO [StorageServiceShutdownHook] 2014-01-10 19:06:51,220 Gossiper.java (line 1238) Announcing
shutdown
DEBUG [GossipTasks:1] 2014-01-10 19:06:51,722 DebuggableThreadPoolExecutor.java (line 245)
Task cancelled
java.util.concurrent.CancellationException
        at java.util.concurrent.FutureTask.report(FutureTask.java:121)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThreadPoolExecutor.java:237)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAfterExecute(DebuggableThreadPoolExecutor.java:201)
        at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:46)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
{noformat}

I copied your new jar over the existing disruptor jar:

{noformat}
cp disruptor-thrift-server-0.3.3-SNAPSHOT.jar lib/disruptor-3.0.1.jar
{noformat}

> describe_ring hangs with hsha thrift server
> -------------------------------------------
>
>                 Key: CASSANDRA-6373
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6373
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Nick Bailey
>            Assignee: Pavel Yaskevich
>             Fix For: 2.0.5
>
>         Attachments: describe_ring_failure.patch, jstack.txt, jstack2.txt
>
>
> There is a strange bug with the thrift hsha server in 2.0 (we switched to lmax disruptor
server).
> The bug is that the first call to describe_ring from one connection will hang indefinitely
when the client is not connecting from localhost (or it at least looks like the client is
not on the same host). Additionally the cluster must be using vnodes. When connecting from
localhost the first call will work as expected. And in either case subsequent calls from the
same connection will work as expected. According to git bisect the bad commit is the switch
to the lmax disruptor server:
> https://github.com/apache/cassandra/commit/98eec0a223251ecd8fec7ecc9e46b05497d631c6
> I've attached the patch I used to reproduce the error in the unit tests. The command
to reproduce is: 
> {noformat}
> PYTHONPATH=test nosetests --tests=system.test_thrift_server:TestMutations.test_describe_ring
> {noformat}
> I reproduced on ec2 and a single machine by having the server bind to the private ip
on ec2 and the client connect to the public ip (so it appears as if the client is non local).
I've also reproduced with two different vms though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message