aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-178) Log/observe snapshot operations
Date Fri, 25 Apr 2014 04:39:15 GMT

    [ https://issues.apache.org/jira/browse/AURORA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980684#comment-13980684
] 

Bill Farner commented on AURORA-178:
------------------------------------

https://reviews.apache.org/r/20469/

> Log/observe snapshot operations
> -------------------------------
>
>                 Key: AURORA-178
>                 URL: https://issues.apache.org/jira/browse/AURORA-178
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Jonathan Boulle
>            Priority: Minor
>              Labels: newbie
>
> Currently, snapshot operations of excessive duration aren't necessarily obvious in e.g.
the scheduler logs or dashboards. Since this is a potentially critical/dangerous operation
(in some cases leading to zookeeper timeouts + scheduler suicide), it would be prudent to
expose relevant information more readily (e.g. when the operations commence/complete, timing,
etc)
> From Zameer:
> {quote}The doSnapshot method of LogStorage is timed with the key "scheduler_log_snapshot".
These are the stats it produces:
> scheduler_log_snapshot_events 19
> scheduler_log_snapshot_events_per_sec 0.0
> scheduler_log_snapshot_nanos_per_event 0.0
> scheduler_log_snapshot_nanos_total 373115257383
> scheduler_log_snapshot_nanos_total_per_sec 0.0
> scheduler_log_snapshot_persist_events 19
> scheduler_log_snapshot_persist_events_per_sec 0.0
> scheduler_log_snapshot_persist_nanos_per_event 0.0
> scheduler_log_snapshot_persist_nanos_total 339151517713
> scheduler_log_snapshot_persist_nanos_total_per_sec 0.0
> scheduler_log_snapshots 19
> Which metric should be tracked in our dashboard?
> {quote}
> From Bill F:
> {quote}a very long snapshot might never be reflected there if a suicide happens mid-way
through. The minimal fix would be to just LOG when a snapshot is about to commence.{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message