cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: Can a Select Count(*) Affect Writes in Cassandra?
Date Thu, 10 Nov 2016 15:33:01 GMT
Or read repair probability with a lot of out of syncs?

Am 10.11.2016 14:42 schrieb "Alexander Dejanovski" <alex@thelastpickle.com>:

> Shalom,
>
> you may have a high trace probability which could explain what you're
> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/
> toolsSetTraceProbability.html
>
> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <clohfink85@gmail.com>
> wrote:
>
>> count(*) actually pages through all the data. So a select count(*) without
>> a limit would be expected to cause a lot of load on the system. The hit is
>> more than just IO load and CPU, it also creates a lot of garbage that can
>> cause pauses slowing down the entire JVM. Some details here:
>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>
>> You may want to consider maintaining the count yourself, using Spark, or
>> if you just want a ball park number you can grab it from JMX.
>>
>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>> actually has nothing to do with flushes. A flush is the operation of moving
>> data from memory (memtable) to disk (SSTable).
>>
>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>> memtable flushing acquired a switchlock on that blocks mutations during the
>> flush (the "pending task" metric is the measure of how many mutations are
>> blocked by this lock).
>>
>> Chris
>>
>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <shaloms@liveperson.com>
>> wrote:
>>
>> Hi Alexander,
>>
>> I'm referring to Writes Count generated from JMX:
>> [image: Inline image 1]
>>
>> The higher curve shows the total write count per second for all nodes in
>> the cluster and the lower curve is the average write count per second per
>> node.
>> The drop in the end is the result of shutting down one application node
>> that performed this kind of query (we still haven't removed the query
>> itself in this cluster).
>>
>>
>> On a different cluster, where we already removed the "select count(*)"
>> query completely, we can see that the issue was resolved (also verified
>> this with running nodetool cfstats a few times and checked the write count
>> difference):
>> [image: Inline image 2]
>>
>>
>> Naturally I asked how can a select query affect the write count of a node
>> but weird as it seems, the issue was resolved once the query was removed
>> from the code.
>>
>> Another side note.. One of our developers that wrote the query in the
>> code, thought it would be nice to limit the query results to 560,000,000.
>> Perhaps the ridiculously high limit might have caused this?
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <shaloms@liveperson.com>
>> wrote:
>>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vladyu@winguzone.com>
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote ----
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vladyu@winguzone.com>
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> <shaloms@liveperson.com <shaloms@liveperson.com>>* wrote ----
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!
>>
>>
>> Shalom Sagges
>>
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Mime
View raw message