cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalom Sagges <shal...@liveperson.com>
Subject Re: Can a Select Count(*) Affect Writes in Cassandra?
Date Thu, 10 Nov 2016 18:33:59 GMT
Got it. Thanks!!


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 8:06 PM, Chris Lohfink <clohfink85@gmail.com> wrote:

> It can be a tad confusing...
>
> The background metric corresponds to a digest mismatch that occurred after
> a completed read, outside of the client read. Will happen if number of
> nodes queried in the requested consistency level was not all of replicas,
> so it was kicked off after the read (this is based on the read repair
> chance, and is the "attempted" metric).
>
> Blocking metric corresponds to the number of times there was a digest
> mismatch within the requested consistency level and a full data read was
> started within the client read.
>
> The two combined shows you have a lot of read repairs happening, possibly
> due to just latency in the initial mutations (not really a problem). If
> thats the case and you have faith in your repairs offline you could just
> set read repairs chance to 0 to reduce load in resending mutations that
> will become consistent eventually anyway.
>
> Chris
>
> On Thu, Nov 10, 2016 at 11:52 AM, Shalom Sagges <shaloms@liveperson.com>
> wrote:
>
>> Thanks a lot for helping on this one.
>> Just one more question... I'm not familiar with the above read repair
>> metrics.
>> Can you please explain what caught your eye there?
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 7:47 PM, Benjamin Roth <benjamin.roth@jaumo.com>
>> wrote:
>>
>>> ... sorry for the short reply. To be a bit more detailed:
>>>
>>> 1. You can lower the read repair probability on that table to avoid the
>>> writes. But be aware that then inconsistency also wont be repaired on reads.
>>> 2. Maybe you should run a repair on that table to get it in sync and
>>> reduce the impact of read repairs. By the way, you should run repairs on a
>>> regular basis anyway but this is a different topic, very extensive and
>>> documented on many different places.
>>>
>>> 2016-11-10 17:44 GMT+00:00 Benjamin Roth <benjamin.roth@jaumo.com>:
>>>
>>>> There you go :)
>>>>
>>>> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <shaloms@liveperson.com>:
>>>>
>>>>> That's a possibility I didn't think of...
>>>>>
>>>>> This is what I see from org.apache.cassandra.metr
>>>>> ics:type=ReadRepair,name=RepairedBackground
>>>>> [image: Inline image 1]
>>>>>
>>>>>
>>>>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>>>>> redBlocking:
>>>>> [image: Inline image 2]
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <shaloms@liveperson.com
>>>>> > wrote:
>>>>>
>>>>>> Yes, it's occurring on the table that receives the count(*) query.
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>> Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>>>>> alex@thelastpickle.com> wrote:
>>>>>>
>>>>>>> So the huge write count is occurring on the table that receives
the
>>>>>>> count(*) query or another table ?
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <
>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>
>>>>>>>> Tracing is off and so is TracingProbability.
>>>>>>>> Just to elaborate, the huge write count occurs only a single
column
>>>>>>>> family which is not one of the system_traces keyspace.
>>>>>>>>
>>>>>>>> I also want to thank you guys for your persistent help regardless
>>>>>>>> if the root cause will be found or not.. You're the best!!
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>> Shalom,
>>>>>>>>
>>>>>>>> you may have a high trace probability which could explain
what
>>>>>>>> you're observing : https://docs.datastax.com/en
>>>>>>>> /cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <clohfink85@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> count(*) actually pages through all the data. So a select
count(*) without
>>>>>>>> a limit would be expected to cause a lot of load on the system.
The hit is
>>>>>>>> more than just IO load and CPU, it also creates a lot of
garbage that can
>>>>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>>>>
>>>>>>>> You may want to consider maintaining the count yourself,
using
>>>>>>>> Spark, or if you just want a ball park number you can grab
it from JMX.
>>>>>>>>
>>>>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or
DELETEs, it
>>>>>>>> actually has nothing to do with flushes. A flush is the operation
of moving
>>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>>
>>>>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the
process
>>>>>>>> of memtable flushing acquired a switchlock on that blocks
mutations during
>>>>>>>> the flush (the "pending task" metric is the measure of how
many mutations
>>>>>>>> are blocked by this lock).
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <
>>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>>
>>>>>>>> Hi Alexander,
>>>>>>>>
>>>>>>>> I'm referring to Writes Count generated from JMX:
>>>>>>>> [image: Inline image 1]
>>>>>>>>
>>>>>>>> The higher curve shows the total write count per second for
all
>>>>>>>> nodes in the cluster and the lower curve is the average write
count per
>>>>>>>> second per node.
>>>>>>>> The drop in the end is the result of shutting down one application
>>>>>>>> node that performed this kind of query (we still haven't
removed the query
>>>>>>>> itself in this cluster).
>>>>>>>>
>>>>>>>>
>>>>>>>> On a different cluster, where we already removed the "select
>>>>>>>> count(*)" query completely, we can see that the issue was
resolved (also
>>>>>>>> verified this with running nodetool cfstats a few times and
checked the
>>>>>>>> write count difference):
>>>>>>>> [image: Inline image 2]
>>>>>>>>
>>>>>>>>
>>>>>>>> Naturally I asked how can a select query affect the write
count of
>>>>>>>> a node but weird as it seems, the issue was resolved once
the query was
>>>>>>>> removed from the code.
>>>>>>>>
>>>>>>>> Another side note.. One of our developers that wrote the
query in
>>>>>>>> the code, thought it would be nice to limit the query results
to
>>>>>>>> 560,000,000. Perhaps the ridiculously high limit might have
caused this?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Shalom,
>>>>>>>>
>>>>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs,
it
>>>>>>>> actually has nothing to do with flushes. A flush is the operation
of moving
>>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>>
>>>>>>>> The Cassandra write path and read path are two different
things
>>>>>>>> and, as far as I know, I see no way for a select count(*)
to increase your
>>>>>>>> write count (if you are indeed talking about actual Cassandra
writes, and
>>>>>>>> not I/O operations).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <
>>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>>
>>>>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our
>>>>>>>> clusters.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>>
>>>>>>>> As I said I'm not sure about it, but it will be interesting
to
>>>>>>>> check memory heap state with any JMX tool, e.g.
>>>>>>>> https://github.com/patric-r/jvmtop
>>>>>>>>
>>>>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported
>>>>>>>> version. Even in 2.0 branch there is 2.0.17 available.
>>>>>>>>
>>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>>
>>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted
Cloud
>>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>>
>>>>>>>>
>>>>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>>>>> <shaloms@liveperson.com <shaloms@liveperson.com>>*
wrote ----
>>>>>>>>
>>>>>>>> Thanks for the quick reply Vladimir.
>>>>>>>> Is it really possible that ~12,500 writes per second (per
node in a
>>>>>>>> 12 nodes DC) are caused by memory flushes?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>>> We Create Meaningful Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this
on
>>>>>>>> behalf of the addressee you must not use, copy, disclose
or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise
the
>>>>>>>> sender immediately by reply email and delete this message.
Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Shalom,
>>>>>>>>
>>>>>>>> so not sure, but probably excessive memory consumption by
this
>>>>>>>> SELECT causes C* to flush tables to free memory.
>>>>>>>>
>>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>>
>>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted
Cloud
>>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>>
>>>>>>>>
>>>>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>>>>> <shaloms@liveperson.com <shaloms@liveperson.com>>*
wrote ----
>>>>>>>>
>>>>>>>> Hi There!
>>>>>>>>
>>>>>>>> I'm using C* 2.0.14.
>>>>>>>> I experienced a scenario where a "select count(*)" that ran
every
>>>>>>>> minute on a table with practically no results limit (yes,
this should
>>>>>>>> definitely be avoided), caused a huge increase in Cassandra
writes to
>>>>>>>> around 150 thousand writes per second for that particular
table.
>>>>>>>>
>>>>>>>> Can anyone explain this behavior? Why would a Select query
>>>>>>>> significantly increase write count in Cassandra?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>>> We Create Meaningful Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this
on
>>>>>>>> behalf of the addressee you must not use, copy, disclose
or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise
the
>>>>>>>> sender immediately by reply email and delete this message.
Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this
on
>>>>>>>> behalf of the addressee you must not use, copy, disclose
or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise
the
>>>>>>>> sender immediately by reply email and delete this message.
Thank you.
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------
>>>>>>>> Alexander Dejanovski
>>>>>>>> France
>>>>>>>> @alexanderdeja
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this
on
>>>>>>>> behalf of the addressee you must not use, copy, disclose
or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise
the
>>>>>>>> sender immediately by reply email and delete this message.
Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------
>>>>>>>> Alexander Dejanovski
>>>>>>>> France
>>>>>>>> @alexanderdeja
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this
on
>>>>>>>> behalf of the addressee you must not use, copy, disclose
or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise
the
>>>>>>>> sender immediately by reply email and delete this message.
Thank you.
>>>>>>>>
>>>>>>> --
>>>>>>> -----------------
>>>>>>> Alexander Dejanovski
>>>>>>> France
>>>>>>> @alexanderdeja
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based
on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Mime
View raw message