ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: [DISCUSSION][IEP-35] Metrics configuration
Date Tue, 06 Aug 2019 12:07:02 GMT
Good example of proper buckets configuration. Such configuration is
suitable for may cases. See attached screenshot (I hope it will not be
reject by mail system or forum).


On Tue, Aug 6, 2019 at 2:45 PM Andrey Gura <agura@apache.org> wrote:
>
> > What do you mean by "exponential bounds"?
>
> Something like this if we talk about latency in ms for example: 5, 10,
> 25, 50, 100, 200, 500, ...
>
> > Thanks, for the feedback, appreciate you ownesty.
>
> Nothing personal. It is just about functionality from user's stand point.
>
> > What is your proposal?
> > How metrics configuration should work?
>
> My proposal is simple: just drop this change. We don't need the
> configuration. Metric owner (developer) defines buckets' bounds for
> each particular case (it could be done uniformly or exponentially, it
> depends on metric and problem definition).
>
> On Mon, Aug 5, 2019 at 6:36 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
> >
> > Hello, Andrey.
> >
> > > Not necessary if we have exponential bounds' values for histograms.
> >
> > What do you mean by "exponential bounds"?
> >
> > > Anyway, in current solution it looks ugly and not usable.
> >
> > Thanks, for the feedback, appreciate you ownesty.
> >
> > > No. But we should admit that this is bad decision and do not include this change
to the code base.
> >
> > What is your proposal?
> > How metrics configuration should work?
> >
> > > Yes. But it still will not give enough accuracy.
> >
> > Enough for what?
> >
> > В Пн, 05/08/2019 в 18:29 +0300, Andrey Gura пишет:
> > > > > - metric configuration is node local (not cluster wide).
> > > > This issue is easy to solve on the user-side and in Ignite core.
> > >
> > > It's imaginary simplicity. The first, you need some additional
> > > automation on user-side in order to configure all nodes of the
> > > cluster. The second, new nodes can join to the cluster and
> > > configuration will be different on new node and on other nodes of the
> > > cluster. This leads to complication whole functionality. Anyway, I
> > > don't like such simplified solution because at the moment it brings
> > > more problems than value.
> > >
> > > > The easiest solution was implemented.
> > > > Do we want to make it more complex right now :)?
> > >
> > > No. But we should admit that this is bad decision and do not include
> > > this change to the code base.
> > >
> > > > The reason it exists in PR - we already have this parameter in DataStorageConfiguration#getMetricsSubIntervalCount
> > >
> > > I believe this method should be deprecated and removed in major release.
> > >
> > > > I think the user should be able to configure buckets for histogram and
rateTimeInterval for hitrate.
> > >
> > > Not necessary if we have exponential bounds' values for histograms.
> > > Anyway, in current solution it looks ugly and not usable.
> > >
> > > > Ignite has dozens of use-cases and deployment modes, seems,
> > > > we can't cover it all with the single predefined buckets/rateTimeInterval
set.
> > >
> > > Yes. But it still will not give enough accuracy.
> > >
> > > On Mon, Aug 5, 2019 at 5:25 PM Nikolay Izhikov <nizhikov@apache.org>
wrote:
> > > >
> > > > Hello, Andrey.
> > > >
> > > > > - metric configuration is node local (not cluster wide).
> > > >
> > > > This issue is easy to solve on the user-side and in Ignite core.
> > > >
> > > > > - metric configuration doesn't survive node restart.
> > > >
> > > > We decide to go with the simplest solution, for now.
> > > > The easiest solution was implemented.
> > > > Do we want to make it more complex right now :)?
> > > >
> > > > > - User shouldn't configure hit rate metrics at runtime in most cases.
> > > >
> > > > I agree with you - the size of the counters array looks odd as a configuration
parameter.
> > > > The reason it exists in PR - we already have this parameter in DataStorageConfiguration#getMetricsSubIntervalCount
> > > >
> > > > > - May be it is enough for user to have histograms with pre-configured
buckets
> > > > > So I think we should drop this change and idea about runtime histrogram
and hit rate configuration.
> > > >
> > > > I think the user should be able to configure buckets for histogram and
rateTimeInterval for hitrate.
> > > >
> > > > Ignite has dozens of use-cases and deployment modes, seems,
> > > > we can't cover it all with the single predefined buckets/rateTimeInterval
set.
> > > >
> > > > В Пн, 05/08/2019 в 16:59 +0300, Andrey Gura пишет:
> > > > > Igniters,
> > > > >
> > > > > I've took a look to the PR and I want follow up this discussion again.
> > > > >
> > > > > Proposed solution has a couple of significant drawbacks:
> > > > >
> > > > > - metric configuration is node local (not cluster wide).
> > > > > - metric configuration doesn't survive node restart.
> > > > >
> > > > > This drawbacks make configuration complex, annoying and useless in
most cases.
> > > > >
> > > > > Moreover, I think that:
> > > > >
> > > > > - User shouldn't configure hit rate metrics at runtime in most cases.
> > > > > Especially HitRateMetric.size because it's just details of
> > > > > implementation. Purpose of size is plots smoothing and this parameter
> > > > > could be fixed (e.g. 16 is enough). HitRate metric is just LongMetric
> > > > > but with additional feature.
> > > > > - May be it is enough for user to have histograms with pre-configured
> > > > > buckets. The trick here is properly chosen bounds. It seems that
> > > > > exponentially chosen values will fit for most cases. So we can avoid
> > > > > runtime configuration for histograms.
> > > > > - We can also provide percentile metric for more accurate
> > > > > measurements. Yes, it will bring additional performance impact and
> > > > > accuracy will not be the best. But it will more clearly and will
not
> > > > > require configuration.
> > > > >
> > > > > So I think we should drop this change and idea about runtime
> > > > > histrogram and hit rate configuration.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > On Tue, Jul 9, 2019 at 2:06 PM Nikolay Izhikov <nizhikov@apache.org>
wrote:
> > > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I made a PR for metrics configuration.
> > > > > >
> > > > > > Please, review
> > > > > >
> > > > > > https://github.com/apache/ignite/pull/6676/files
> > > > > >
> > > > > > В Вт, 09/07/2019 в 12:27 +0300, Nikolay Izhikov пишет:
> > > > > > > Hello, Alex.
> > > > > > >
> > > > > > > OK, Let's go with the simplest solution.
> > > > > > > I will provide API and JMX method for metrics configuration
shortly.
> > > > > > >
> > > > > > > В Пн, 08/07/2019 в 18:23 +0300, Alexey Goncharuk пишет:
> > > > > > > > Nikolay,
> > > > > > > >
> > > > > > > > To me a separate metrics configuration file seems
to be not very
> > > > > > > > user-friendly. First of all, it does not allow to
configure the system only
> > > > > > > > from Java code. Second, having multiple configuration
files seem to be
> > > > > > > > quite confusing for end users (judging by the logging
configuration
> > > > > > > > questions).
> > > > > > > >
> > > > > > > > Perhaps, we will still end up with the configuration
file - but for now I
> > > > > > > > would put this aside for a more thorough brainstorm
and added the JMX and
> > > > > > > > internal API for changing metrics configuration.
> > > > > > > >
> > > > > > > > пт, 5 июл. 2019 г. в 14:17, Seliverstov Igor
<gvvinblade@gmail.com>:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > One more question on topic.
> > > > > > > > >
> > > > > > > > > Should we preserve metrics configuration on restart?
(I think we should)
> > > > > > > > >
> > > > > > > > > If so, which configuration use after restart?
Defined in config file or
> > > > > > > > > saved in config storage? (I guess, saved configuration
should have a
> > > > > > > > > priority)
> > > > > > > > >
> > > > > > > > > So, how to tell users that any changes in configuration
file have no
> > > > > > > > > effect on Ignite configuration after first start?
> > > > > > > > >
> > > > > > > > > I think there are too many open questions and
(at least at now) we should
> > > > > > > > > provide only JMX API until all of the questions
are clarified.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Igor
> > > > > > > > >
> > > > > > > > > > 4 июля 2019 г., в 19:55, Nikolay Izhikov
<nizhikov@apache.org>
> > > > > > > > >
> > > > > > > > > написал(а):
> > > > > > > > > >
> > > > > > > > > > Hello, Andrey.
> > > > > > > > > >
> > > > > > > > > > > 3. I can't imagine that adequate values
will be chosen on project
> > > > > > > > > > > setup stage.
> > > > > > > > > >
> > > > > > > > > > Configuration file required in the case
we adds new node or replace
> > > > > > > > >
> > > > > > > > > existing to the cluster.
> > > > > > > > > > Use can have parameters similar to Ignite
configuration, log
> > > > > > > > >
> > > > > > > > > configuration files.
> > > > > > > > > >
> > > > > > > > > > > My proposal is adding API for boundaries
configuration to the metrics
> > > > > > > > > > > framework and expose it via JMX
> > > > > > > > > >
> > > > > > > > > > Agree. I think we should have both:
> > > > > > > > > >
> > > > > > > > > > 1. Configuration file.
> > > > > > > > > > 2. JMX API to change bounaries of histogram
*and HitRateMetric params*.
> > > > > > > > > >
> > > > > > > > > > But, if you and other community member are
against config file, let's
> > > > > > > > >
> > > > > > > > > have only JMX.
> > > > > > > > > > Seems, JMX will provide required level of
configurability for metrics.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > В Чт, 04/07/2019 в 17:53 +0300, Andrey
Gura пишет:
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > I rethought the issue and I see some
problems:
> > > > > > > > > > >
> > > > > > > > > > > 1. It seems that in most cases bucket
boundaries configuration will be
> > > > > > > > > > > problem for user. Absolute values for
latency boundaries it is very
> > > > > > > > > > > odd choice.
> > > > > > > > > > > 2. Also seems that latency for most
caches (if we configure cache
> > > > > > > > > > > metrics fro example) will be similar.
> > > > > > > > > > > 3. I can't imagine that adequate values
will be chosen on project
> > > > > > > > > > > setup stage. So chosen values should
be changed in the future.
> > > > > > > > > > >
> > > > > > > > > > > Solution with configuration file looks
unnatural and creates more
> > > > > > > > > > > problems than could solve.
> > > > > > > > > > >
> > > > > > > > > > > My proposal is adding API for boundaries
configuration to the metrics
> > > > > > > > > > > framework and expose it via JMX (at
this step). It still provides
> > > > > > > > > > > configuration possibility but don't
force user to do it.
> > > > > > > > > > >
> > > > > > > > > > > Also we should chose default values
for bucket boundaries. And it is
> > > > > > > > > > > most complex problem at the moment
:) Let's discuss it.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 3, 2019 at 4:49 PM Andrey
Gura <agura@apache.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Nikolai,
> > > > > > > > > > > >
> > > > > > > > > > > > Metric is disabled if it doesn't
allocate any memory and doesn't
> > > > > > > > > > > > update any variable because doesn't
have any value. Ideally disabling
> > > > > > > > > > > > metrics for some cache should
be equal to cache stopping.
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 28, 2019 at 1:02 PM
Nikolay Izhikov <nizhikov@apache.org>
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello, Alexey.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > > > >
> > > > > > > > > > > > > > My only concert is that
we should have the metrics framework
> > > > > > > > >
> > > > > > > > > configuration
> > > > > > > > > > > > > > as the first-citizen
of the framework itself
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes. I planned to add `void
configure(String param)` method to the
> > > > > > > > >
> > > > > > > > > metric API.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > but change the metrics
parameters in
> > > > > > > > > > > > > > runtime from JMX or
command-line, etc.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've add requirement of JMX
method to the ticket:
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-11927
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Another concern is to
have an
> > > > > > > > > > > > > > ability to disable/enable
metrics per metrics group/prefix.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, we discusss it.
> > > > > > > > > > > > > But, let's make it clear:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *What is disabling metric?*
> > > > > > > > > > > > >
> > > > > > > > > > > > > Looks like exporter filter
solve this task.
> > > > > > > > > > > > >
> > > > > > > > > > > > > В Чт, 27/06/2019 в 16:24
+0300, Alexey Goncharuk пишет:
> > > > > > > > > > > > > > Nikolay,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > My only concert is that
we should have the metrics framework
> > > > > > > > >
> > > > > > > > > configuration
> > > > > > > > > > > > > > as the first-citizen
of the framework itself. This way, we can
> > > > > > > > >
> > > > > > > > > configure
> > > > > > > > > > > > > > the metrics not only
from file, but change the metrics parameters in
> > > > > > > > > > > > > > runtime from JMX or
command-line, etc. Another concern is to have an
> > > > > > > > > > > > > > ability to disable/enable
metrics per metrics group/prefix.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The logger-like configuration
meets these suggestions given that the
> > > > > > > > > > > > > > configuration is generalized
into the metrics framework.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What do you think?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > чт, 27 июн. 2019
г. в 12:30, Nikolay Izhikov <nizhikov@apache.org>:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello, Igniters.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As you may know,
I've contributed Phase1 [1] for IEP-35 [2].
> > > > > > > > > > > > > > > Now we have metrics
subsystem and can create and export any metrics
> > > > > > > > >
> > > > > > > > > from
> > > > > > > > > > > > > > > Ignite.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think user(administrator
of Ignite) should be able to configure
> > > > > > > > >
> > > > > > > > > some
> > > > > > > > > > > > > > > metrics params
in a common way [3]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I propose to use
the same way from logging frameworks.
> > > > > > > > > > > > > > > We should define
some file format Ignite can understand.
> > > > > > > > > > > > > > > An administrator
fills configuration file to configure one or
> > > > > > > > >
> > > > > > > > > several
> > > > > > > > > > > > > > > metrics.
> > > > > > > > > > > > > > > Ignite will analyze
the file and use provided params during metrics
> > > > > > > > > > > > > > > creation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For now, we have
2 types of metrics that should be configured:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >        *      
HistrogramMetric [4]
> > > > > > > > > > > > > > >               
This metric is a count of measurement that falls into
> > > > > > > > > > > > > > > predefined intervals.
> > > > > > > > > > > > > > >               
An example is "Request processing time distribution".
> > > > > > > > > > > > > > >               
We want to calculate a count of requests processed
> > > > > > > > >
> > > > > > > > > quicker
> > > > > > > > > > > > > > > then 50ms, 50-100,
100-250, 250-500 and slower.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >        *      
HitRateMetric [5]
> > > > > > > > > > > > > > >               
This metric is a count of events in the last time
> > > > > > > > >
> > > > > > > > > interval.
> > > > > > > > > > > > > > >               
An example is the "Count of requests processed in
> > > > > > > > >
> > > > > > > > > the last
> > > > > > > > > > > > > > > 5 seconds".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Example of file
content:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ````
> > > > > > > > > > > > > > > cache.my-cahe.GetLatency=50,100,250,500
#Params for the histogram
> > > > > > > > >
> > > > > > > > > metric
> > > > > > > > > > > > > > > with the name `cache.my-cahe.get`
> > > > > > > > > > > > > > > cache.my-cache.RebalancingKeysRate=60000
#Param for existing
> > > > > > > > >
> > > > > > > > > HitRateMetric
> > > > > > > > > > > > > > > that hold "Estimated
rebalancing speed in keys".
> > > > > > > > > > > > > > > ````
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please, share your
vision.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://github.com/apache/ignite/commit/fdaa310430aefff07994eb35510d3416886b5bbe
> > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > > > > > > > > > > [3] https://issues.apache.org/jira/browse/IGNITE-11927
> > > > > > > > > > > > > > > [4]
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HistogramMetric.java
> > > > > > > > > > > > > > > [5]
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HitRateMetric.java
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message