reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1732) Build Metrics System
Date Tue, 05 Jun 2018 16:56:00 GMT

    [ https://issues.apache.org/jira/browse/REEF-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502105#comment-16502105
] 

ASF GitHub Bot commented on REEF-1732:
--------------------------------------

singlis commented on issue #1460: [REEF-1732] Build Metrics System
URL: https://github.com/apache/reef/pull/1460#issuecomment-394783512
 
 
   Thanks Mandy for the information!
   
   So EventCounters are cross plat, they are recommended to use over the Performance counters
because of that reason. They do have roots from Performance Counters, which can be a bit confusing.
 Please refer to the reply here by vancem:
   https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.Tracing/documentation/EventCounterTutorial.md
   
   As for PerfView, yes that is my understanding for the current toolset. For Windows, you
maybe able to also use the Performance Monitor, but I haven't been able to confirm that. There
should be an .net API that allows for collection of these values, please see the EventListener:
   https://msdn.microsoft.com/en-us/library/system.diagnostics.tracing.eventlistener(v=vs.110).aspx
   It handles a OnEventWritten function when an event is written. 
   
   Also EventSource looks to support different types of formats looking at the EventFieldFormat
enum:
   https://msdn.microsoft.com/en-us/library/system.diagnostics.tracing.eventfieldformat(v=vs.110).aspx
   
   Ok, so now looking more at this. It looks like if you use EventCounters, they will notify
you when an event counter updates via the EventListener. So this is a push model. You can
collect this data like you are doing now on the evaluator and submit that via the heartbeat.
The driver would then aggregate this data.
   
   Given this information, would this work? Sounds like it matches the requirements in that
you would not miss a data point as you could via the pull method. Also what I like about the
EventCounters is that its utilizing the .net core api. 
   
   Let me know -- Im happy to also review what you have.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Build Metrics System
> --------------------
>
>                 Key: REEF-1732
>                 URL: https://issues.apache.org/jira/browse/REEF-1732
>             Project: REEF
>          Issue Type: New Feature
>          Components: IMRU, REEF
>            Reporter: Julia
>            Assignee: Julia
>            Priority: Major
>         Attachments: IMRU Metrics System.docx
>
>
> IMRU Metrics is to provide metrics data to the system so that it can be shown to the
user for monitoring or diagnosis. The goal is to build an E2E flow with simple/basic metrics
data. We can then add more data later. 
> * IMetricsProvider - there are multiple sources of metrics data:
>   1.Task metrics. This is in particular for IMRU task such as current iteration, progress.
Each task can send task state back to driver and let driver to aggregate it. Alternatively,
as UpdateTask knows current iterations and progress, to start with, we can just get task status
from update task. The task metrics can be provided by task function like IUpdateFunction and
send to driver by task host as TaskMessage with heartbeat. 
>   2. Driver metrics – For IMRU driver, it can be system state such as WaitingForEvaluator
or TasksRunning, current retry number, etc. Those driver states are maintained inside IMRU
driver. 
>  3. IMRUDriver will implement IMetricsProvider and supply metrics data. 
> * IMetricsSink – the metrics data will be output somewhere so that it can be consumed
by a monitoring tool. An interface IMetricsSink will be defined to sink metrics data. An implementation
of the interface can store the data to a remote storage. Multiple sinks can be injected. 
> * MetricsManager – It schedules a timer to get metrics from IMetricsProviders and output
the metrics data with IMetricsSinks
> Attached file shows the diagram of the design. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message