incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Multiple counters value after restart
Date Fri, 02 Nov 2012 14:26:37 GMT
I ran the same cql query against my 3 nodes (after adding the third and
repairing each of them):

On the new node:

cqlsh:mykeyspace> select '20121029#myevent' from 'mycf' where key =
'887#day';

 20121029#myevent
-------------------
              4983

On the 2 others (old nodes):

cqlsh:mykeyspace> select '20121029#myevent' from 'mycf' where key =
'887#day';
 20121029#myevent
-------------------
              4254

And the read value atc CL.QUORUM is 4943, which is the good value.

How is it possible that QUORUM read 4943 with only 1 node out of 3
answering that count ?
How could a new node, get a value that none of other existing node has ?
Is there a way to fix the data (isn't repair supposed to do it) ?

Alain



2012/11/1 Alain RODRIGUEZ <arodrime@gmail.com>

> "Can you try it thought, or run a repair ?"
>
> Repairing didn't help
>
> "My first thought is to use QUOURM"
>
> This fix the problem. However, my data is probably still inconsistent,
> even if I read now always the same value. The point is that I can't handle
> a crash with CL.QUORUM, I can't even restart a node...
>
> I will add a third server.
>
>   "But isn't Cassandra suppose to handle a server crash ? When a server
> crashes I guess it don't drain before..."
>
> "I was asking to understand how you did the upgrade."
>
> Ok. On my side I am just concern about the possibility of using counters
> with CL.ONE and correctly handle a crash or restart without a drain.
>
> Alain
>
>
>
> 2012/11/1 aaron morton <aaron@thelastpickle.com>
>
>> "What CL are you using ?"
>>
>> I think this can be what causes the issue. I'm writing and reading at CL
>> ONE. I didn't drain before stopping Cassandra and this may have produce a
>> fail in the current counters (those which were being written when I stopped
>> a server).
>>
>> My first thought is to use QUOURM. But with only two nodes it's hard to
>> get strong consistency using  QUOURM.
>> Can you try it thought, or run a repair ?
>>
>> But isn't Cassandra suppose to handle a server crash ? When a server
>> crashes I guess it don't drain before...
>>
>> I was asking to understand how you did the upgrade.
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 1/11/2012, at 11:39 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>>
>> "What version of cassandra are you using ?"
>>
>> 1.1.2
>>
>> "Can you explain this further?"
>>
>> I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one
>> server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by
>> rebooting the server. This server is dedicated to cassandra. I can't tell
>> you more about it 'cause I don't get it... But a simple Cassandra restart
>> wasn't enough.
>>
>> "Was something writing to the cluster ?"
>>
>> Yes we are having some activity and perform about 600 w/s.
>>
>> "Did you drain for the upgrade ?"
>>
>> We upgrade a long time ago and to 1.1.2. This warning is about the
>> version 1.1.6.
>>
>> "What changes did you make ?"
>>
>> In the cassandra.yaml I just change the "compaction_throughput_mb_per_sec"
>> property to slow down my compaction a bit. I don't think the problem come
>> from here.
>>
>> "Are you saying that a particular counter column is giving different
>> values for different reads ?"
>>
>> Yes, this is exactly what I was saying. Sorry if something is wrong with
>> my English, it's not my mother tongue.
>>
>> "What CL are you using ?"
>>
>> I think this can be what causes the issue. I'm writing and reading at CL
>> ONE. I didn't drain before stopping Cassandra and this may have produce a
>> fail in the current counters (those which were being written when I stopped
>> a server).
>>
>> But isn't Cassandra suppose to handle a server crash ? When a server
>> crashes I guess it don't drain before...
>>
>> Thank you for your time Aaron, once again.
>>
>> Alain
>>
>>
>>
>> 2012/10/31 aaron morton <aaron@thelastpickle.com>
>>
>>> What version of cassandra are you using ?
>>>
>>>  I finally restart Cassandra. It didn't solve the problem so I stopped
>>>> Cassandra again on that node and restart my ec2 server. This solved the
>>>> issue (1800 r/s to 100 r/s).
>>>
>>> Can you explain this further?
>>> Was something writing to the cluster ?
>>> Did you drain for the upgrade ?
>>> https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt#L17
>>>
>>> Today I changed my cassandra.yml and restart this same server to apply
>>>> my conf.
>>>
>>> What changes did you make ?
>>>
>>> I just noticed that my homepage (which uses a Cassandra counter and
>>>> refreshes every sec) shows me 4 different values. 2 of them repeatedly
>>>> (5000 and 4000) and the 2 other some rare times (5500 and 3800)
>>>
>>> Are you saying that a particular counter column is giving different
>>> values for different reads ?
>>> What CL are you using ?
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 31/10/2012, at 3:39 AM, Jason Wee <peichieh@gmail.com> wrote:
>>>
>>> maybe enable the debug in log4j-server.properties and going through the
>>> log to see what actually happen?
>>>
>>> On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRIGUEZ <arodrime@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have an issue with counters, yesterday I had a lot of
>>>> ununderstandable reads/sec on one server. I finally restart Cassandra. It
>>>> didn't solve the problem so I stopped Cassandra again on that node and
>>>> restart my ec2 server. This solved the issue (1800 r/s to 100 r/s).
>>>>
>>>> Today I changed my cassandra.yml and restart this same server to apply
>>>> my conf.
>>>>
>>>> I just noticed that my homepage (which uses a Cassandra counter and
>>>> refreshes every sec) shows me 4 different values. 2 of them repeatedly
>>>> (5000 and 4000) and the 2 other some rare times (5500 and 3800)
>>>>
>>>> Only the counters made today and yesterday are concerned.
>>>>
>>>> I performed a repair without success. These data are the heart of our
>>>> business so if someone had any clue on it, I would be really grateful...
>>>>
>>>> The sooner the better, I am in production with these random counters.
>>>>
>>>> Alain
>>>>
>>>> INFO:
>>>>
>>>> My environnement is 2 nodes (EC2 large), RF 2, CL.ONE (R & W), Random
>>>> Partitioner.
>>>>
>>>> xxx.xxx.xxx.241    eu-west     1b          Up     Normal  151.95 GB
>>>>   50.00%              0
>>>> xxx.xxx.xxx.109    eu-west     1b          Up     Normal  117.71 GB
>>>>   50.00%              85070591730234615865843651857942052864
>>>>
>>>> Here is my conf: http://pastebin.com/5cMuBKDt
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Mime
View raw message