incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohit bhatia <rohit2...@gmail.com>
Subject Re: are counters stable enough for production?
Date Tue, 18 Sep 2012 09:07:54 GMT
@Robin
I'm pretty sure the GC issue is due to counters only. Since we have
only write-heavy counter incrementing traffic.
GC Frequency also increases linearly with write load.

@Bartlomiej
On Stress Testing, we see GC frequency and consequently write latency
increase to several milliseconds.
At 50k qps we had GC running every 1-2 second. And since each Parnew
takes around 100ms, we were spending 10% of each server's time GCing.

Also, we don't have persistent connections, but testing with
persistent connections give roughly the same result.

At a traffic of roughly 20k qps for 8 nodes with RF 2, we have Young
Gen GC running on each node every 4 seconds (approximately).
We have a young gen heap size of 3200M which is already too big by any
standards.

Also decreasing Replication factor from 2 to 1 reduced the GC
frequency 5-6 times.

Any Advice?

Also, our traffic is evenly
On Tue, Sep 18, 2012 at 1:36 PM, Robin Verlangen <robin@us2.nl> wrote:
> We've not been trying to create inconsistencies as you describe above. But
> it seems legit that those situations cause problems.
>
> Sometimes you can see log messages that indicate that counters are out of
> sync in the cluster and they get "repaired". My guess would be that the
> repairs actually destroys it, however I have no knowledge of the underlying
> techniques. I think this because of the fact that those read repairs happen
> a lot (as you mention: lots of reads) and might get over-repaired or
> something? However, this is all just a guess. I hope someone with a lot
> knowledge about Cassandra internals can shed some light on this.
>
> Best regards,
>
> Robin Verlangen
> Software engineer
>
> W http://www.robinverlangen.nl
> E robin@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/9/18 Bartłomiej Romański <br@sentia.pl>
>>
>> Garbage is one more issue we are having with counters. We are
>> operating under very heavy load. Counters are spread over 7 nodes with
>> SSD drives and we often seeing CPU usage between 90-100%. We are doing
>> mostly reads. Latency is very important for us so GC pauses taking
>> longer than 10ms (often around 50-100ms) are very annoying.
>>
>> I don't have actual numbers right now, but we've also got the
>> impressions that cassandra generates "too much" garbage. Is there a
>> possible that counters are somehow guilty?
>>
>> @Rohit: Did you tried something more stressful? Like sending more
>> traffic to a node that it can actually handle, turning nodes up and
>> down, changing the topology (moving/adding nodes)? I believe our
>> problems comes from very high load and some operations like this
>> (adding new nodes, replacing dead ones etc...). I was expecting that
>> cassandra will fail some request, loose consistency temporarily or
>> something like that in such cases, but generation highly incorrect
>> values was very disappointing.
>>
>> Thanks,
>> Bartek
>>
>>
>> On Tue, Sep 18, 2012 at 9:30 AM, Robin Verlangen <robin@us2.nl> wrote:
>> > @Rohit: We also use counters quite a lot (lets say 2000 increments /
>> > sec),
>> > but don't see the 50-100KB of garbage per increment. Are you sure that
>> > memory is coming from your counters?
>> >
>> > Best regards,
>> >
>> > Robin Verlangen
>> > Software engineer
>> >
>> > W http://www.robinverlangen.nl
>> > E robin@us2.nl
>> >
>> > Disclaimer: The information contained in this message and attachments is
>> > intended solely for the attention and use of the named addressee and may
>> > be
>> > confidential. If you are not the intended recipient, you are reminded
>> > that
>> > the information remains the property of the sender. You must not use,
>> > disclose, distribute, copy, print or rely on this e-mail. If you have
>> > received this message in error, please contact the sender immediately
>> > and
>> > irrevocably delete this message and any copies.
>> >
>> >
>> >
>> > 2012/9/18 rohit bhatia <rohit2412@gmail.com>
>> >>
>> >> We use counters in a 8 node cluster with RF 2 in cassandra 1.0.5.
>> >> We use phpcassa and execute cql queries through thrift to work with
>> >> composite types.
>> >>
>> >> We do not have any problem of overcounts as we tally with RDBMS daily.
>> >>
>> >> It works fine but we are having some GC pressure for young generation.
>> >> Per my calculation around 50-100 KB of garbage is generated every
>> >> counter increment.
>> >> Is this memory usage expected of counters?
>> >>
>> >> On Tue, Sep 18, 2012 at 7:16 AM, Bartłomiej Romański <br@sentia.pl>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > Does anyone have any experience with using Cassandra counters in
>> >> > production?
>> >> >
>> >> > We rely heavily on them and recently we've got a few very serious
>> >> > problems. Our counters values suddenly became a few times higher than
>> >> > expected. From the business point of view this is a disaster :/ Also
>> >> > there a few open major bugs related to them. Some of them for quite
>> >> > long (months).
>> >> >
>> >> > We are seriously considering going back to other solutions (e.g. SQL
>> >> > databases). We simply cannot afford incorrect counter values. We can
>> >> > tolerate loosing a few increments from time to time, but we cannot
>> >> > tolerate having counters suddenly 3 times higher or lower than the
>> >> > expected values.
>> >> >
>> >> > What is the current status of counters? Should I consider them a
>> >> > production-ready feature and we just have some bad luck? Or should
I
>> >> > rather consider them as a experimental-feature and look for some
>> >> > other
>> >> > solutions?
>> >> >
>> >> > Do you have any experiences with them? Any comments would be very
>> >> > helpful for us!
>> >> >
>> >> > Thanks,
>> >> > Bartek
>> >
>> >
>
>

Mime
View raw message