openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Henke <martin.he...@web.de>
Subject Re: Replace log markers with metrics sent to statsd
Date Thu, 28 Sep 2017 11:43:48 GMT
Tyson,

thank you for your comments.
Since Vadim is away today I will try to answer.

I fully agree that we should keep the logging back-end discussion 
separate. Therefore I am with you to tackle this in a first PR that 
changes the logging implementation to Logback (with the console appender 
configured, leaving the asynchronicity solely to Akka).
Interested parties could easily adapt the log configuration if needed.

Orthogonal to this discussion is the need to reduce log output.
While I mostly agree with your comments above I see the need to move
(most) metrics out of the log path as a an integral (or even 
prerequisite) part of the 'log reduction' work that has to be tackled
early. Tightly coupled part is to adjust log levels carefully to allow a 
more restricted 'info' level logging (we probably even have to add 
messages back that were written as metrics before)

My take on the metrics vs tracing discussion is that metrics fulfill the
need to consistently watch the behavior of the system on a more general 
level in opposite be able to deep dive in the behavior of single work 
threads of the systems (at best to be switched on and off as needed).
As you said, most counter related metrics fall in the first category but
also timing information might fall in that category if the aggregation 
level is 'high enough' (e.g to monitor "the average request time of one 
controller").

Having that said (please excuse the length of this answer) I propose to 
follow this path:
- replace logging implementation with Logback
- reduce log generation by
   - write metrics with Kamon
   - adjust and reduce existing log levels and messages
- look into tracing as follow-up

Kind Regards,
Martin


On 27.09.17 19:44, Tyson Norris wrote:
> Thanks Vadmin!
> 
> A few comments:
> 
> - RE: async logging:  I thought the akka logger was already async? This is different
than the logback async appender. I’m OK with using the async appender, but people should
know that it by default is configured to ignore events once its buffer is full, so its possible
to get into a situation where log messages “go missing” if faced with sufficient traffic.
> 
> - RE: kamon: It’s not clear to me that kamon supports timing data for general methods,
although it looks like it keeps timing data for akka message processing.
> Currently the log markers in OW keep timing data for arbitrary blocks of code (unrelated
to akka). It looks like to get “time in method” metrics you will still need to measure
the time, and then record that time using kamon.
> 
> I know Sandeep is looking into using the timing/tracing aspect as part of this PR https://github.com/apache/incubator-openwhisk/pull/2282.
> 
> It may be interesting to consider kamon TraceContext for this, but it is also somewhat
invasive - seems like it requires wrapping all Future usage as in:
> Tracer.withNewContext("sample-trace") {
>      Future {}
> }
> 
> Is there a way to use this without this wrapping?
> 
> BTW - I agree that any “counter” type metrics should be replaced and kamon seems
good for this, I’m just not sure about the timing markers.
> 
> - RE: first steps: Does changing log library or appender config imply the need to change
the current marker/timing impl? I think a few logical steps are:
> - replace logging impl (with logback)
> - reduce log generation
> - replace log marker counter impl (with kamon?)
> - replace logging approach (with TraceContext or similar, if an option even exists, with
the goal being to NOT pass whisk Logging around everywhere)
> 
> (I’m really not sure about the last one, but it would be nice to a less invasive way
to emit logs with context)
> 
> Thanks
> Tyson
> 
> 
> 
> 
> 
> 
> On Sep 27, 2017, at 7:39 AM, Vadim Raskin <raskinvadim@gmail.com<mailto:raskinvadim@gmail.com>>
wrote:
> 
> Hi OpenWhisk community,
> 
> while doing our profiling tests on the controller we have found that the
> way we log and the logging per se make a huge impact on the controller
> thoughput. In a short, the more parallel activations you have, the more
> blocking threads waiting for the logger will be there. After doing several
> experiments we found a couple of action items:
> * replace the logging library (e.g. to logback), use an async appender and
> increase the size of the LogAppender buffer.
> * reduce the number of logs that we generate.
> 
> Since the first action item will only provide a partial relief and postpone
> the point when we start observing the degradation, reducing the number of
> logs seems to be a more plausible way to go.
> 
> One of the first points I’d like to address are the metrics that we append
> to log messages in a form of markers.
> 
> The initial idea is to send them directly to statsd or some other backend.
> For example Kamon library is a good candidate: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkamon.io&data=02%7C01%7C%7Cfa62bc6102964a52294908d505b59165%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636421199745973146&sdata=K8azfpdoqTeRzQrhB1LhBY76z5DrTSGyg7DuBmLMJZ8%3D&reserved=0
> 
> Here is the issue I’ve opened some time ago.
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-openwhisk%2Fissues%2F2780&data=02%7C01%7C%7Cfa62bc6102964a52294908d505b59165%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636421199745973146&sdata=QTe%2BNE3YuQpULGcZit%2FNPMAZ0EU9oJ56Yojgyrrs3rU%3D&reserved=0
> 
> Do you have any suggestion/concerns regarding that?
> 
> Thank you.
> 
> regards, Vadim.
> 


Mime
View raw message