cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lohfink <clohfin...@gmail.com>
Subject Re: Can a Select Count(*) Affect Writes in Cassandra?
Date Thu, 10 Nov 2016 17:08:11 GMT
I actually read this completely wrong. Can you check the
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground and
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking metrics?
Perhaps reading all the data to service the count(*) is causing a lot of
read repairs if your data is inconsistent.

Chris

On Thu, Nov 10, 2016 at 9:33 AM, Benjamin Roth <benjamin.roth@jaumo.com>
wrote:

> Or read repair probability with a lot of out of syncs?
>
> Am 10.11.2016 14:42 schrieb "Alexander Dejanovski" <alex@thelastpickle.com
> >:
>
>> Shalom,
>>
>> you may have a high trace probability which could explain what you're
>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>> /toolsSetTraceProbability.html
>>
>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <clohfink85@gmail.com>
>> wrote:
>>
>>> count(*) actually pages through all the data. So a select count(*) without
>>> a limit would be expected to cause a lot of load on the system. The hit is
>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>> cause pauses slowing down the entire JVM. Some details here:
>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>
>>> You may want to consider maintaining the count yourself, using Spark, or
>>> if you just want a ball park number you can grab it from JMX.
>>>
>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>> flush (the "pending task" metric is the measure of how many mutations are
>>> blocked by this lock).
>>>
>>> Chris
>>>
>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <shaloms@liveperson.com>
>>> wrote:
>>>
>>> Hi Alexander,
>>>
>>> I'm referring to Writes Count generated from JMX:
>>> [image: Inline image 1]
>>>
>>> The higher curve shows the total write count per second for all nodes in
>>> the cluster and the lower curve is the average write count per second per
>>> node.
>>> The drop in the end is the result of shutting down one application node
>>> that performed this kind of query (we still haven't removed the query
>>> itself in this cluster).
>>>
>>>
>>> On a different cluster, where we already removed the "select count(*)"
>>> query completely, we can see that the issue was resolved (also verified
>>> this with running nodetool cfstats a few times and checked the write count
>>> difference):
>>> [image: Inline image 2]
>>>
>>>
>>> Naturally I asked how can a select query affect the write count of a
>>> node but weird as it seems, the issue was resolved once the query was
>>> removed from the code.
>>>
>>> Another side note.. One of our developers that wrote the query in the
>>> code, thought it would be nice to limit the query results to 560,000,000.
>>> Perhaps the ridiculously high limit might have caused this?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>> Hi Shalom,
>>>
>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> The Cassandra write path and read path are two different things and, as
>>> far as I know, I see no way for a select count(*) to increase your write
>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>> operations).
>>>
>>> Cheers,
>>>
>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shaloms@liveperson.com>
>>> wrote:
>>>
>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vladyu@winguzone.com>
>>> wrote:
>>>
>>> As I said I'm not sure about it, but it will be interesting to check
>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>> -r/jvmtop
>>>
>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>> Even in 2.0 branch there is 2.0.17 available.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote ----
>>>
>>> Thanks for the quick reply Vladimir.
>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>> nodes DC) are caused by memory flushes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vladyu@winguzone.com
>>> > wrote:
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> Hi Shalom,
>>>
>>> so not sure, but probably excessive memory consumption by this SELECT
>>> causes C* to flush tables to free memory.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote ----
>>>
>>> Hi There!
>>>
>>> I'm using C* 2.0.14.
>>> I experienced a scenario where a "select count(*)" that ran every minute
>>> on a table with practically no results limit (yes, this should definitely
>>> be avoided), caused a huge increase in Cassandra writes to around 150
>>> thousand writes per second for that particular table.
>>>
>>> Can anyone explain this behavior? Why would a Select query significantly
>>> increase write count in Cassandra?
>>>
>>> Thanks!
>>>
>>>
>>> Shalom Sagges
>>>
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>

Mime
View raw message