cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: Can a Select Count(*) Affect Writes in Cassandra?
Date Thu, 10 Nov 2016 17:47:32 GMT
... sorry for the short reply. To be a bit more detailed:

1. You can lower the read repair probability on that table to avoid the
writes. But be aware that then inconsistency also wont be repaired on reads.
2. Maybe you should run a repair on that table to get it in sync and reduce
the impact of read repairs. By the way, you should run repairs on a regular
basis anyway but this is a different topic, very extensive and documented
on many different places.

2016-11-10 17:44 GMT+00:00 Benjamin Roth <benjamin.roth@jaumo.com>:

> There you go :)
>
> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <shaloms@liveperson.com>:
>
>> That's a possibility I didn't think of...
>>
>> This is what I see from org.apache.cassandra.metr
>> ics:type=ReadRepair,name=RepairedBackground
>> [image: Inline image 1]
>>
>>
>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>> redBlocking:
>> [image: Inline image 2]
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <shaloms@liveperson.com>
>> wrote:
>>
>>> Yes, it's occurring on the table that receives the count(*) query.
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>>> So the huge write count is occurring on the table that receives the
>>>> count(*) query or another table ?
>>>>
>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <shaloms@liveperson.com>
>>>> wrote:
>>>>
>>>>> Tracing is off and so is TracingProbability.
>>>>> Just to elaborate, the huge write count occurs only a single column
>>>>> family which is not one of the system_traces keyspace.
>>>>>
>>>>> I also want to thank you guys for your persistent help regardless if
>>>>> the root cause will be found or not.. You're the best!!
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>> Shalom,
>>>>>
>>>>> you may have a high trace probability which could explain what you're
>>>>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>>>>> /toolsSetTraceProbability.html
>>>>>
>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <clohfink85@gmail.com>
>>>>> wrote:
>>>>>
>>>>> count(*) actually pages through all the data. So a select count(*) without
>>>>> a limit would be expected to cause a lot of load on the system. The hit
is
>>>>> more than just IO load and CPU, it also creates a lot of garbage that
can
>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>
>>>>> You may want to consider maintaining the count yourself, using Spark,
>>>>> or if you just want a ball park number you can grab it from JMX.
>>>>>
>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>> actually has nothing to do with flushes. A flush is the operation of
moving
>>>>> data from memory (memtable) to disk (SSTable).
>>>>>
>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>>> memtable flushing acquired a switchlock on that blocks mutations during
the
>>>>> flush (the "pending task" metric is the measure of how many mutations
are
>>>>> blocked by this lock).
>>>>>
>>>>> Chris
>>>>>
>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <shaloms@liveperson.com
>>>>> > wrote:
>>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> I'm referring to Writes Count generated from JMX:
>>>>> [image: Inline image 1]
>>>>>
>>>>> The higher curve shows the total write count per second for all nodes
>>>>> in the cluster and the lower curve is the average write count per second
>>>>> per node.
>>>>> The drop in the end is the result of shutting down one application
>>>>> node that performed this kind of query (we still haven't removed the
query
>>>>> itself in this cluster).
>>>>>
>>>>>
>>>>> On a different cluster, where we already removed the "select count(*)"
>>>>> query completely, we can see that the issue was resolved (also verified
>>>>> this with running nodetool cfstats a few times and checked the write
count
>>>>> difference):
>>>>> [image: Inline image 2]
>>>>>
>>>>>
>>>>> Naturally I asked how can a select query affect the write count of a
>>>>> node but weird as it seems, the issue was resolved once the query was
>>>>> removed from the code.
>>>>>
>>>>> Another side note.. One of our developers that wrote the query in the
>>>>> code, thought it would be nice to limit the query results to 560,000,000.
>>>>> Perhaps the ridiculously high limit might have caused this?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>> Hi Shalom,
>>>>>
>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>> actually has nothing to do with flushes. A flush is the operation of
moving
>>>>> data from memory (memtable) to disk (SSTable).
>>>>>
>>>>> The Cassandra write path and read path are two different things and,
>>>>> as far as I know, I see no way for a select count(*) to increase your
write
>>>>> count (if you are indeed talking about actual Cassandra writes, and not
I/O
>>>>> operations).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shaloms@liveperson.com>
>>>>> wrote:
>>>>>
>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>> vladyu@winguzone.com> wrote:
>>>>>
>>>>> As I said I'm not sure about it, but it will be interesting to check
>>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>>> -r/jvmtop
>>>>>
>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>>>> Even in 2.0 branch there is 2.0.17 available.
>>>>>
>>>>> Best regards, Vladimir Yudovin,
>>>>>
>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>> CassandraLaunch your cluster in minutes.*
>>>>>
>>>>>
>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote
----
>>>>>
>>>>> Thanks for the quick reply Vladimir.
>>>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>>>> nodes DC) are caused by memory flushes?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc>
>>>>> We Create Meaningful Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>> vladyu@winguzone.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>> Hi Shalom,
>>>>>
>>>>> so not sure, but probably excessive memory consumption by this SELECT
>>>>> causes C* to flush tables to free memory.
>>>>>
>>>>> Best regards, Vladimir Yudovin,
>>>>>
>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>> CassandraLaunch your cluster in minutes.*
>>>>>
>>>>>
>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote
----
>>>>>
>>>>> Hi There!
>>>>>
>>>>> I'm using C* 2.0.14.
>>>>> I experienced a scenario where a "select count(*)" that ran every
>>>>> minute on a table with practically no results limit (yes, this should
>>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>>> around 150 thousand writes per second for that particular table.
>>>>>
>>>>> Can anyone explain this behavior? Why would a Select query
>>>>> significantly increase write count in Cassandra?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc>
>>>>> We Create Meaningful Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Mime
View raw message