cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ZAIDI, ASAD A" <az1...@att.com>
Subject RE: MUTATION messages were dropped in last 5000 ms for cross node timeout
Date Thu, 03 Aug 2017 17:41:20 GMT
Hi Akhil,

Thank you for your reply.

I kept testing different timeout numbers over last week and eventually settled at setting
*_request_timeout_in_ms parameters at 1.5minutes for coordinator wait time. That is the number
where I donot see any dropped mutations.

Also asked developers to tweak data model where we saw bunch of tables with really large partition
size , some are ranging  Partition-key size around ~6.6GB.. we’re now working to reduce
the partition size of the tables. I am hoping corrected data model will help reduce coordinator
wait time (get back to default number!)  again.

Thank again/Asad

From: Akhil Mehra [mailto:akhilmehra@gmail.com]
Sent: Friday, July 21, 2017 4:24 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

Hi Asad,

The 5000 ms is not configurable (https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_8b3a60b9a7dbefeecc06bace617279612ec7092d_src_java_org_apache_cassandra_net_MessagingService.java-23L423&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g&s=USbrEM6jaGFIRKSUhJBx3VAkSSrXzid0db6TDV1vrDs&e=>).
This just the time after which the number of dropped messages are reported. Thus dropped messages
are reported every 5000ms.

If you are looking to tweak the number of ms after which a message is considered dropped then
you need to use the write_request_timeout_in_ms.  The write_request_timeout_in_ms (http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_configuration_configCassandra-5Fyaml-5Fr.html&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g&s=ab1NW9WoAXIlxT2kWjsiYFVaVidEnC_MB770pwTtqLs&e=>)
can be used to increase the mutation timeout. By default it is set to 2000ms.

I hope that helps.

Regards,
Akhil


On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A <az192g@att.com<mailto:az192g@att.com>>
wrote:

Hi Akhil,

Thank you for your reply. Previously, I did ‘tune’ various timeouts – basically increased
them a bit but none of those parameter listed in the link matches with that “were dropped
in last 5000 ms”.
I was wondering from where that [5000ms] number is coming from when,  like I mentioned before,
none of any timeout parameter settings matches that #!

Load is intermittently high but again cpu queue length never goes beyond medium depth. I wonder
if there is some internal limit that I’m still not aware of.

Thanks/Asad


From: Akhil Mehra [mailto:akhilmehra@gmail.com]
Sent: Thursday, July 20, 2017 3:47 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

Hi Asad,

http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_faq_index.html-23why-2Dmessage-2Ddropped&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=WcHuHKcjg2YCsAbw2NR_0-CiHr9JNxtCzYikia16mpo&s=0_0pQfoOZLuswpQ_lE-AU2bTMFLgRbR4k4Kh8vEOZSk&e=>

As mentioned in the link above this is a load shedding mechanism used by Cassandra.

Is you cluster under heavy load?

Regards,
Akhil


On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A <az192g@att.com<mailto:az192g@att.com>>
wrote:

Hello Folks –

I’m using apache-cassandra 2.2.8.

I see many messages like below in my system.log file. In Cassandra.yaml file [ cross_node_timeout:
true] is set and NTP server is also running correcting clock drift on 16node cluster. I do
not see pending or blocked HintedHandoff  in tpstats output though there are bunch of MUTATIONS
dropped observed.

<start timeout message >
INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - MUTATION messages
were dropped in last 5000 ms: 822 for internal timeout and 2152 for cross node timeout
<end timeout message >

I’m seeking help here if you please let me know what I need to check in order to address
these cross node timeouts.

Thank you,
Asad

Mime
View raw message