incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McNelis <dmcne...@gmail.com>
Subject Failed shuffle
Date Thu, 18 Apr 2013 05:06:58 GMT
I had a situation earlier where my shuffle failed after a hard disk drive
filled up.  I went through and disabled shuffle on the machines while
trying to get the situation resolved.  Now, while I can re-enable shuffle
on the machines, when trying to do an ls, I get a timeout.

Looking at the cassandra-shuffle code, it is trying execute this query:

SELECT token_bytes,requested_at FROM system.range_xfers

which is throwing the following error in my logs:

java.lang.AssertionError: [min(-1),max(-219851097003960625)]
        at org.apache.cassandra.dht.Bounds.<init>(Bounds.java:41)
        at org.apache.cassandra.dht.Bounds.<init>(Bounds.java:34)
        at org.apache.cassandra.dht.Bounds.withNewRight(Bounds.java:121)
        at
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1172)
        at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:132)
        at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:62)
        at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132)
        at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143)
        at
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1726)
        at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074)
        at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062)
        at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
        at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:679)


So this causes me two major issues, first, I can't restart my dead node
because it ends up with a Concurrency exception while trying to find
relocating tokens during StorageService initialization, and I can't clear
the moves because nothing is able to read what is in that range_xfers table
(at least, I also was not able to read it through cqlsh).

I thought I could recreate the table, but system is a restricted keyspace
and it looks like I can't drop and recreate that table, and cql requires a
key for delete... and since you can't get the key without getting an
error....

Is there something simple I can do that I'm just missing right now?  Right
now I can't restart nodes because of this, nor sucessfully add new nodes to
my ring.

Mime
View raw message