aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Boulle (JIRA)" <>
Subject [jira] [Created] (AURORA-178) Log/observe snapshot operations
Date Tue, 04 Feb 2014 00:27:10 GMT
Jonathan Boulle created AURORA-178:

             Summary: Log/observe snapshot operations
                 Key: AURORA-178
             Project: Aurora
          Issue Type: Task
          Components: Scheduler
            Reporter: Jonathan Boulle
            Priority: Minor

Currently, snapshot operations of excessive duration aren't necessarily obvious in e.g. the
scheduler logs or dashboards. Since this is a potentially critical/dangerous operation (in
some cases leading to zookeeper timeouts + scheduler suicide), it would be prudent to expose
relevant information more readily (e.g. when the operations commence/complete, timing, etc)

>From Zameer:
{quote}The doSnapshot method of LogStorage is timed with the key "scheduler_log_snapshot".
These are the stats it produces:

scheduler_log_snapshot_events 19
scheduler_log_snapshot_events_per_sec 0.0
scheduler_log_snapshot_nanos_per_event 0.0
scheduler_log_snapshot_nanos_total 373115257383
scheduler_log_snapshot_nanos_total_per_sec 0.0
scheduler_log_snapshot_persist_events 19
scheduler_log_snapshot_persist_events_per_sec 0.0
scheduler_log_snapshot_persist_nanos_per_event 0.0
scheduler_log_snapshot_persist_nanos_total 339151517713
scheduler_log_snapshot_persist_nanos_total_per_sec 0.0
scheduler_log_snapshots 19

Which metric should be tracked in our dashboard?

>From Bill F:
{quote}a very long snapshot might never be reflected there if a suicide happens mid-way through.
The minimal fix would be to just LOG when a snapshot is about to commence.{quote}

This message was sent by Atlassian JIRA

View raw message