incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Michalski <mich...@opera.com>
Subject Re: Failed migration from 1.1.6 to 1.2.2
Date Thu, 14 Mar 2013 12:24:24 GMT

> It will happen if your rpc_address is set to 0.0.0.0.

Ops, it's not what I meant ;-)
It will happen, if your rpc_address is set to IP that is not defined in 
your cluster's config (e.g. in cassandra-topology.properties for 
PropertyFileSnitch)

M.

>
> M.
>
> W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze:
>> Thanks for this pointer but I don't think this is the source of our
>> problem
>> since we use 1 data center and Ec2Snitch.
>>
>>
>>
>> 2013/3/14 Jean-Armel Luce <jaluce06@gmail.com>
>>
>>> Hi Alain,
>>>
>>> Maybe it is due to https://issues.apache.org/jira/browse/CASSANDRA-5299
>>>
>>> A patch is provided with this ticket.
>>>
>>> Regards.
>>>
>>> Jean Armel
>>>
>>>
>>> 2013/3/14 Alain RODRIGUEZ <arodrime@gmail.com>
>>>
>>>> Hi
>>>>
>>>> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
>>>>
>>>> This has been a disaster. I just switch one node to 1.2.2, updated its
>>>> configuration (cassandra.yaml / cassandra-env.sh) and restart it.
>>>>
>>>> It resulted on error on all the 5 remaining 1.1.6 nodes :
>>>>
>>>> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750
>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>> Thread[RequestResponseStage:2,5,main]
>>>> java.io.IOError: java.io.EOFException
>>>>          at
>>>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
>>>>
>>>>          at
>>>> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
>>>>
>>>>          at
>>>> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
>>>>
>>>>          at
>>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>>>>
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>
>>>>          at java.lang.Thread.run(Thread.java:662)
>>>> Caused by: java.io.EOFException
>>>>          at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>          at
>>>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
>>>>
>>>>          at
>>>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
>>>>
>>>>          at
>>>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
>>>>
>>>>          ... 6 more
>>>>
>>>> I had this a lot of times, and my entire cluster wasn't reachable by
>>>> our
>>>> 4 clients (phpCassa, Hector, Cassie, Helenus)
>>>>
>>>> I decommissioned the 1.2.2 node to get our cluster answering
>>>> queries. It
>>>> worked.
>>>>
>>>> Then I tried to replace this node by a new C*1.1.6 one with the same
>>>> token as the previous node decommissioned. The node joined the ring and
>>>> before getting any data switch to normal status.
>>>>
>>>> In all the other nodes I had :
>>>>
>>>> ERROR [MutationStage:8] 2013-03-14 10:21:01,288
>>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>>> Thread[MutationStage:8,5,main]
>>>> java.lang.AssertionError
>>>>          at
>>>> org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
>>>>
>>>>          at
>>>> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:371)
>>>>
>>>>          at
>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>>          at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>>>          at
>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>
>>>>          at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>
>>>>          at java.lang.Thread.run(Thread.java:662)
>>>>
>>>> So I decommissioned this new 1.1.6 node and we are now running with 5
>>>> servers, not balanced along the ring, without any possibility of adding
>>>> nodes, nor upgradinc C* version.
>>>>
>>>> We are quite desperate over here.
>>>>
>>>> If someone has any idea of what could happened and how to stabilize the
>>>> cluster, it will be very appreciated.
>>>>
>>>> It's quite an emergency since we can't add nodes and are under heavy
>>>> load.
>>>>
>>>>
>>>
>>


Mime
View raw message