incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How to solve this kind of schema disagreement...
Date Wed, 10 Aug 2011 09:47:02 GMT
I don't have time to look into the reasons for that error, but that does not sound good. It
kind of sounds like there are multiple migration chains out there in the cluster. This could
come from apply changes to different nodes at the same time. 

Is this a prod system ? If not I would shut it down, wipe all the Schema and Migration SSTables
and then apply the schema again one CF at a time (it will take time to read the data). 

If it's a prod system it may need some delicate surgery on the Migrations and Schema CF's.


Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 15:41, Dikang Gu wrote:

> And a lot of "not apply" logs.
> 
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line
70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 DefinitionsUpdateVerbHandler.java (line
80) Migration not applied Previous version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line
70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 DefinitionsUpdateVerbHandler.java (line
80) Migration not applied Previous version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line
70) Applying AddColumnFamily from /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 DefinitionsUpdateVerbHandler.java (line
80) Migration not applied Previous version mismatch. cannot apply.
> 
> -- 
> Dikang Gu
> 0086 - 18611140205
> On Wednesday, August 10, 2011 at 11:35 AM, Dikang Gu wrote:
> 
>> Hi Aaron,
>> 
>> I set the log level to be DEBUG, and find a lot of forceFlush debug info in the log:
>> 
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush
requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush
requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush
requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 725) forceFlush
requested but everything is clean
>> 
>> What does this mean?
>> 
>> Thanks.
>>  
>> 
>> -- 
>> Dikang Gu
>> 0086 - 18611140205
>> On Wednesday, August 10, 2011 at 6:42 AM, aaron morton wrote:
>> 
>>> um. There has got to be something stopping the migration from completing. 
>>> 
>>> Turn the logging up to DEBUG before starting and look for messages from MigrationManager.java
>>> 
>>> Provide all the log messages from Migration.java on the 1.27 node
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 8 Aug 2011, at 15:52, Dikang Gu wrote:
>>> 
>>>> Hi Aaron, 
>>>> 
>>>> I repeat the whole procedure:
>>>> 
>>>> 1. kill the cassandra instance on 1.27.
>>>> 2. rm the data/system/Migrations-g-*
>>>> 3. rm the data/system/Schema-g-*
>>>> 4. bin/cassandra to start the cassandra.
>>>> 
>>>> Now, the migration seems stop and I do not find any error in the system.log
yet.
>>>> 
>>>> The ring looks good:
>>>> [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090
ring
>>>> Address         DC          Rack        Status State   Load            Owns
   Token                                       
>>>>                                                                         
      127605887595351923798765477786913079296     
>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         25.00%
 1                                           
>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.54 GB         34.01%
 57856537434773737201679995572503935972      
>>>> 192.168.1.27    datacenter1 rack1       Up     Normal  1.78 GB         24.28%
 99165710459060760249270263771474737125      
>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         16.72%
 127605887595351923798765477786913079296  
>>>> 
>>>> But the schema still does not correct:
>>>> Cluster Information:
>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>    Schema versions: 
>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 192.168.1.25]
>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>> 
>>>> The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last timeā€¦
>>>> 
>>>> And in the log, the last Migration.java log is:
>>>>  INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116)
Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: SimpleDB_4E38DAA64894A9146100000500000000rep
strategy:SimpleStrategy{}durable_writes: true
>>>> 
>>>> Could you explain this? 
>>>> 
>>>> If I change the token given to 1.27 to another one, will it help?
>>>> 
>>>> Thanks.
>>>> 
>>>> -- 
>>>> Dikang Gu
>>>> 0086 - 18611140205
>>>> On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
>>>> 
>>>>> did you check the logs in 1.27 for errors ? 
>>>>> 
>>>>> Could you be seeing this ? https://issues.apache.org/jira/browse/CASSANDRA-2867
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> 
>>>>> On 7 Aug 2011, at 16:24, Dikang Gu wrote:
>>>>> 
>>>>>> I restart both nodes, and deleted the shcema* and migration* and
restarted them.
>>>>>> 
>>>>>> The current cluster looks like this:
>>>>>> [default@unknown] describe cluster;         
>>>>>> Cluster Information:
>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>    Schema versions: 
>>>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9,
192.168.1.25]
>>>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>> 
>>>>>> the 1.28 looks good, and the 1.27 still can not get the schema agreement...
>>>>>> 
>>>>>> I have tried several times, even delete all the data on 1.27, and
rejoin it as a new node, but it is still unhappy.
>>>>>> 
>>>>>> And the ring looks like this: 
>>>>>> 
>>>>>> Address         DC          Rack        Status State   Load     
      Owns    Token                                       
>>>>>>                                                                 
              127605887595351923798765477786913079296     
>>>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB  
      25.00%  1                                           
>>>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB  
      34.01%  57856537434773737201679995572503935972     
>>>>>> 192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB  
      24.28%  99165710459060760249270263771474737125      
>>>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB  
      16.72%  127605887595351923798765477786913079296 
>>>>>> 
>>>>>> The 1.27 seems can not join the cluster, and it just hangs there...
>>>>>> 
>>>>>> Any suggestions?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 
>>>>>> On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aaron@thelastpickle.com>
wrote:
>>>>>> After there restart you what was in the  logs for the 1.27 machine
 from the Migration.java logger ? Some of the messages will start with "Applying migration"
>>>>>> 
>>>>>> You should have shut down both of the nodes, then deleted the schema*
and migration* system sstables, then restarted one of them and watched to see if it got to
schema agreement. 
>>>>>> 
>>>>>> Cheers
>>>>>>   
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Cassandra Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>> 
>>>>>> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
>>>>>> 
>>>>>>> I have tried this, but the schema still does not agree in the
cluster:
>>>>>>> 
>>>>>>> [default@unknown] describe cluster;
>>>>>>> Cluster Information:
>>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>    Schema versions: 
>>>>>>> 	UNREACHABLE: [192.168.1.28]
>>>>>>> 	75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>>>>> 	5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>> 
>>>>>>> Any other suggestions to solve this?
>>>>>>> 
>>>>>>> Because I have some production data saved in the cassandra cluster,
so I can not afford data lost...
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <benoit@noisette.ch>
wrote:
>>>>>>>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>>>>>>>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so
shutdown and
>>>>>>>> remove the schema* and migration* sstables from both 192.168.1.28
and
>>>>>>>> 192.168.1.27
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2011/8/5 Dikang Gu <dikang85@gmail.com>:
>>>>>>>> > [default@unknown] describe cluster;
>>>>>>>> > Cluster Information:
>>>>>>>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>> >    Schema versions:
>>>>>>>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>>>>>>>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9,
192.168.1.25]
>>>>>>>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>>> >
>>>>>>>> >  three different schema versions in the cluster...
>>>>>>>> > --
>>>>>>>> > Dikang Gu
>>>>>>>> > 0086 - 18611140205
>>>>>>>> >
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Dikang Gu
>>>>>>> 
>>>>>>> 0086 - 18611140205
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Dikang Gu
>>>>>> 
>>>>>> 0086 - 18611140205
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message