incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Failed migration from 1.1.6 to 1.2.2
Date Thu, 14 Mar 2013 13:52:12 GMT
> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750 AbstractCassandraDaemon.java (line
135) Exception in thread Thread[RequestResponseStage:2,5,main]
> java.io.IOError: java.io.EOFException
>         at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
>         at org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
>         at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
>         at org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
>         at org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
>         ... 6 more
The only thing close to that is https://issues.apache.org/jira/browse/CASSANDRA-3585

Please file a new ticket for this issue. 


> ERROR [MutationStage:8] 2013-03-14 10:21:01,288 AbstractCassandraDaemon.java (line 135)
Exception in thread Thread[MutationStage:8,5,main]
> java.lang.AssertionError
>         at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
>         at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:371)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
Caused by looking up the IP for a node that is not a member of the cluster. 
Obviously this should not happen. 
You can try to reset the cluster ring state by doing a rolling restart passing -Dcassandra.load_ring_state=false
as a JVM param in cassandra-env.sh

I would try to stabilise the at 1.1.6 then upgrade to the latest 1.1, then try again. Do the
rolling restart above, check the ring and nodetool gossip and look for IP addresses that should
not be there. Add the missing node as a 1.1.6, then upgrade all to 1.1.10. 

You should be able to go from 1.1.6 but if it did not work its a good idea to get to the latest
1.1 first. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/03/2013, at 5:31 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> We have it set to 0.0.0.0 but anyway, as told before, I don't think our problem come
from this bug.
> 
> 
> 2013/3/14 Michal Michalski <michalm@opera.com>
> 
> It will happen if your rpc_address is set to 0.0.0.0.
> 
> Ops, it's not what I meant ;-)
> It will happen, if your rpc_address is set to IP that is not defined in your cluster's
config (e.g. in cassandra-topology.properties for PropertyFileSnitch)
> 
> 
> M.
> 
> 
> M.
> 
> W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze:
> Thanks for this pointer but I don't think this is the source of our
> problem
> since we use 1 data center and Ec2Snitch.
> 
> 
> 
> 2013/3/14 Jean-Armel Luce <jaluce06@gmail.com>
> 
> Hi Alain,
> 
> Maybe it is due to https://issues.apache.org/jira/browse/CASSANDRA-5299
> 
> A patch is provided with this ticket.
> 
> Regards.
> 
> Jean Armel
> 
> 
> 2013/3/14 Alain RODRIGUEZ <arodrime@gmail.com>
> 
> Hi
> 
> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
> 
> This has been a disaster. I just switch one node to 1.2.2, updated its
> configuration (cassandra.yaml / cassandra-env.sh) and restart it.
> 
> It resulted on error on all the 5 remaining 1.1.6 nodes :
> 
> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[RequestResponseStage:2,5,main]
> java.io.IOError: java.io.EOFException
>          at
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
> 
>          at
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
> 
>          at
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
> 
>          at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
> 
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 
>          at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
>          at java.io.DataInputStream.readFully(DataInputStream.java:180)
>          at
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
> 
>          at
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
> 
>          at
> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
> 
>          ... 6 more
> 
> I had this a lot of times, and my entire cluster wasn't reachable by
> our
> 4 clients (phpCassa, Hector, Cassie, Helenus)
> 
> I decommissioned the 1.2.2 node to get our cluster answering
> queries. It
> worked.
> 
> Then I tried to replace this node by a new C*1.1.6 one with the same
> token as the previous node decommissioned. The node joined the ring and
> before getting any data switch to normal status.
> 
> In all the other nodes I had :
> 
> ERROR [MutationStage:8] 2013-03-14 10:21:01,288
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[MutationStage:8,5,main]
> java.lang.AssertionError
>          at
> org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
> 
>          at
> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:371)
> 
>          at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>          at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>          at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 
>          at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 
>          at java.lang.Thread.run(Thread.java:662)
> 
> So I decommissioned this new 1.1.6 node and we are now running with 5
> servers, not balanced along the ring, without any possibility of adding
> nodes, nor upgradinc C* version.
> 
> We are quite desperate over here.
> 
> If someone has any idea of what could happened and how to stabilize the
> cluster, it will be very appreciated.
> 
> It's quite an emergency since we can't add nodes and are under heavy
> load.
> 
> 
> 
> 
> 
> 


Mime
View raw message