cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
Date Sun, 13 Dec 2015 15:06:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055007#comment-15055007
] 

Paulo Motta commented on CASSANDRA-10580:
-----------------------------------------

Looking good. A few comments:
* I think you're fine using {{System.currentTimeMillis()}} instead of {{ApproximateTime}}
to avoid imprecisions.
* Rename {{TimeTaken}} to {{DroppedLatency}} to be consistent with other similar metric names
(https://wiki.apache.org/cassandra/Metrics).
** Actually, I think it's better to have 2 metrics {{InternalDroppedLatency}} and {{CrossNodeDroppedLatency}},
since they will be quite different (see CASSANDRA-9793 for more information).
* Add tests to check metrics are correct, probably on {{MessagingServiceTests}}
* You'll probably also want to verify if the metrics are working by bringing up a cluster
manually or with ccm, stress it with cassandra-stress and see if new metrics are being recorded
correctly via jmx with visualvm.
* The latest patch does not apply with {{fatal: corrupt patch at line 125}}, I don't know
exactly what's that. I wonder if it's a cross-platform thing. Are you able to apply it locally?

Answering your questions:

bq. Also, a question: It appears that Timer.Update appends entries to the metric (which is
what we want). Do you know at what point it starts dropping new appends / starts giving up
? I wonder if there is a huge number of dropped mutations will the timeTaken metric mess up
?

I think the metrics package already handles that. I think {{Timer}} metrics keeps running
averages and not the actual values, so no need to cleanup afaik.

bq. To make this work for CF, I will probably pass the mutation to MessagingService.LogDroppedMessages
(maybe through an overload) and update the metrics on appropriate CF. Does that make sense
?

sounds good

bq. If this change looks good, I am more inclined towards making this work for CF before making
up patches for old branches. Let me know if that's okay.

sure, watch out for additional details with CF metrics such as cleaning up the metrics if
CF is dropped, etc. You'll probably want to integrate this with the {{TableMetrics}} class.

> On dropped mutations, more details should be logged.
> ----------------------------------------------------
>
>                 Key: CASSANDRA-10580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>         Environment: Production
>            Reporter: Anubhav Kale
>            Assignee: Anubhav Kale
>            Priority: Minor
>             Fix For: 3.2, 2.2.x
>
>         Attachments: 10580-Metrics.patch, 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. At a minimum,
we should print the time the thread took to get scheduled thereby dropping the mutation (We
should also print the Message / Mutation so it helps in figuring out which column family was
affected). This will help find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I will submit
a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message