flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@data-artisans.com>
Subject Re: System resource logger
Date Wed, 11 Oct 2017 14:08:08 GMT
I have decided to drop this static logged once part. Those are static informations, that user
can obtain in some more conventional way.

For now I have left cpu, memory, swap and network interfaces stats.

Piotrek

> On 5 Oct 2017, at 18:45, Bowen Li <bowen.li@offerupnow.com> wrote:
> 
> System and processor info, marked as 'logged once' in gist shared by Piotr,
> should still be logged instead of registered as metrics, right?
> 
> On Thu, Oct 5, 2017 at 2:38 AM, Till Rohrmann <trohrmann@apache.org> wrote:
> 
>> Thanks for the proposal Piotr. I like it a lot since it will help people to
>> better understand their system. I would also be in favour of adding them to
>> the system metrics. I think o.a.f.runtime.metrics.util.MetricUtils is the
>> right place to start. Given the small dependency footprint and the
>> compatible license, I would be in favour of option 1.
>> 
>> Cheers,
>> Till
>> ​
>> 
>> On Thu, Oct 5, 2017 at 11:19 AM, Piotr Nowojski <piotr@data-artisans.com>
>> wrote:
>> 
>>> +1 thanks for pointing this out. It makes sense to just expand those
>>> system metrics (I was not aware of them).
>>> 
>>>> On Oct 4, 2017, at 6:07 PM, Greg Hogan <code@greghogan.com> wrote:
>>>> 
>>>> What if we added these as system metrics and added a way to write
>>> metrics to a (separate?) log file?
>>>> 
>>>> 
>>>>> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <piotr@data-artisans.com>
>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Lately I was debugging some weird test failures on Travis and I needed
>>> to look into metrics like:
>>>>> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since
>>> previous check)
>>>>> - System wide memory consumption (including making sure that swap was
>>> disabled)
>>>>> - network usage
>>>>> - etc…
>>>>> 
>>>>> Without an access to the machines itself. For this purpose I
>>> implemented some periodic daemon thread logger. Log output looked like
>> this:
>>>>> 
>>>>> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 <
>>> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7>
>>>>> 
>>>>> I think it would be nice to add this feature to Flink itself, by
>>> extending existing MemoryLogger. Same lack of information that I had with
>>> travis could easily happen on productional environments. The problem is
>>> that there is no easy way to obtain such kind of information without
>> using
>>> some external libraries (think about cross platform support). I have used
>>> for that:
>>>>> 
>>>>> https://github.com/oshi/oshi <https://github.com/oshi/oshi>
>>>>> 
>>>>> It has some minimal additional dependencies, one thing worth noting is
>>> a JNA - it’s JAR weights ~1MB. We would have two options to add this
>>> feature:
>>>>> 
>>>>> 1. Include this oshi dependency in flink-runtime
>>>>> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make
>>> this new module an optional/dynamically loaded  dependency by
>> flink-runtime
>>> (used only if user manually copies flink-resource-logger.jar to a class
>>> path).
>>>>> 
>>>>> I would lean toward 1., since that’s a powerful tool and it’s
>>> dependencies are pretty minimal (except this JNA’s jar size). What do you
>>> think?
>>>>> 
>>>>> Piotrek
>>> 
>>> 
>> 


Mime
View raw message