incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Michalski <mich...@opera.com>
Subject Re: Failed migration from 1.1.6 to 1.2.2
Date Thu, 14 Mar 2013 12:12:13 GMT
Just to make it clear: This bug will occur on single-DC configuration too.

In our case it resulted in Exception like this at the very end of node 
startup:

ERROR [WRITE-/<SOME-IP>] 2013-02-27 12:14:55,433 CassandraDaemon.java 
(line 133) Exception in thread Thread[WRITE-/<SOME-IP>,5,main]
java.lang.RuntimeException: Unknown host /0.0.0.0 with no default configured

It will happen if your rpc_address is set to 0.0.0.0.

M.

W dniu 14.03.2013 13:03, Alain RODRIGUEZ pisze:
> Thanks for this pointer but I don't think this is the source of our problem
> since we use 1 data center and Ec2Snitch.
>
>
>
> 2013/3/14 Jean-Armel Luce <jaluce06@gmail.com>
>
>> Hi Alain,
>>
>> Maybe it is due to https://issues.apache.org/jira/browse/CASSANDRA-5299
>>
>> A patch is provided with this ticket.
>>
>> Regards.
>>
>> Jean Armel
>>
>>
>> 2013/3/14 Alain RODRIGUEZ <arodrime@gmail.com>
>>
>>> Hi
>>>
>>> We just tried to migrate our production cluster from C* 1.1.6 to 1.2.2.
>>>
>>> This has been a disaster. I just switch one node to 1.2.2, updated its
>>> configuration (cassandra.yaml / cassandra-env.sh) and restart it.
>>>
>>> It resulted on error on all the 5 remaining 1.1.6 nodes :
>>>
>>> ERROR [RequestResponseStage:2] 2013-03-14 09:53:25,750
>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>> Thread[RequestResponseStage:2,5,main]
>>> java.io.IOError: java.io.EOFException
>>>          at
>>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
>>>          at
>>> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
>>>          at
>>> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
>>>          at
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>          at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.io.EOFException
>>>          at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>          at
>>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
>>>          at
>>> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
>>>          at
>>> org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
>>>          ... 6 more
>>>
>>> I had this a lot of times, and my entire cluster wasn't reachable by our
>>> 4 clients (phpCassa, Hector, Cassie, Helenus)
>>>
>>> I decommissioned the 1.2.2 node to get our cluster answering queries. It
>>> worked.
>>>
>>> Then I tried to replace this node by a new C*1.1.6 one with the same
>>> token as the previous node decommissioned. The node joined the ring and
>>> before getting any data switch to normal status.
>>>
>>> In all the other nodes I had :
>>>
>>> ERROR [MutationStage:8] 2013-03-14 10:21:01,288
>>> AbstractCassandraDaemon.java (line 135) Exception in thread
>>> Thread[MutationStage:8,5,main]
>>> java.lang.AssertionError
>>>          at
>>> org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
>>>          at
>>> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:371)
>>>          at
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>>          at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>>          at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>          at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>          at java.lang.Thread.run(Thread.java:662)
>>>
>>> So I decommissioned this new 1.1.6 node and we are now running with 5
>>> servers, not balanced along the ring, without any possibility of adding
>>> nodes, nor upgradinc C* version.
>>>
>>> We are quite desperate over here.
>>>
>>> If someone has any idea of what could happened and how to stabilize the
>>> cluster, it will be very appreciated.
>>>
>>> It's quite an emergency since we can't add nodes and are under heavy load.
>>>
>>>
>>
>


Mime
View raw message