ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: [IEP-35] Monitoring & Profiling. Phase 2
Date Fri, 13 Sep 2019 12:13:47 GMT
Nikolay,

thanks a lot for clarification! I added some comments to Upsource review [1].

Here I want to discuss some high-level issues.

1. Naming

"There are only two hard things in Computer Science: cache
invalidation and naming things."
-- Phil Karlton

I really don't like name MonitoringList. First of all because it isn't
about monitoring at all while can be useful for monitoring purposes.

We already have SQL system views and I think that system view is good
candidate for naming of new entity. As result we will have consistent
naming which better describes domain.

I think akso that GridMetricManager is bad candidate for lists (system
views) management. Because it isn't about metrics. May be new
SystemViewManager will better fit to this purposes.

2. Management

Lists (aka system views) have life cycle now. I believe that it is
redundant functionality. There is no any reason for enabling/disabling
lists. There is no any interaction with lists on hot path of code flow
and there is no any performance impact.

So lists management can be reduced to lists creation and registration
operations (which executes only on node start).

3. Code generation

Code generation for walkers is also redundant. Amount of system views
in the system is strongly limited (units not dozens) so it is easier
to change walker by hand literally than navigate to code generator and
run it. Moreover, first you should add Order annotation in the proper
place and it make generator practically useless.

If you still see benefit that can bring Order annotation you can use
reflection. Motivation is simple, system views are on not hot path and
I expected that API for system views will not called frequently.

4. Export

I really don't understand why we should export system views content
(especially periodically). Real life use case is take view content on
demand. So we should have public API for it, SQL API and JMX. There is
no need any exporters.


What do you think about it? Also it would be great to involve more
people to this discussion.

[1] https://reviews.ignite.apache.org/ignite/review/IGNT-CR-1065

On Wed, Sep 11, 2019 at 6:24 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
>
> Hello, Andrey.
>
> Thanks, for joining the review.
>
> Basic interface for objects list is `MonitoringList`. It provides the following features:
>         * name.
>         * description.
>         * row class.
>         * size.
>         * iterator for the list content.
>         * attribute walker (described below).
>
> `MonitoringRow` is a marker interface for classes that can be used as a monitoring list
content.
>
> Internally, there is only one implementation of `MonitoringList`, for now, `MonitoringListAdapter`.
> It adapts the content of some `ConcurrentMap` which uses widely in Ignite internals.
> I think, will be another implementation in the follow-up PRs.
>
> Public API changes:
>
> * New registry created `ReadOnlyMonitoringListRegistry` It provides access:
>         * To all lists that exist in the Ignite.
>         * Ability to subscribe to the list creation/removal events.
>
> * `MetricExporterSpi` changes:
>         * `setMonitoringListRegistry` method added
>         * `setMonitoringListExportFilter` method added.
>
> `MonitoringRowAttributeWalker` is a helper class for exporter implementations.
> Usually, exporter SPI iterates on `MonitoringRow` attributes.
> `SqlViewExporterSpi`, `JmxMetricExporterSpi` can be taken as an example.
> It can be implemented with Java reflection API, but I use more quick approach.
> `MonitoringRowAttributeWalker` can visit each attribute of the MonitoringRow implementation.
> It's also, preserves, the order provided by the MonitoringRow implementation author.
> It provides 2 main methods:
>         * `visitAll(AttributeVisitor visitor);` - visits each attribute of the some monitoring
row class. Provides index, name and class of attribute to the consumer.
>         * `visitAll(R row, AttributeWithValueVisitor visitor)` - visits each attribute
of some monitoring row instance. Provides index, name, class, value of attribute to the consumer.
>
>
> В Ср, 11/09/2019 в 16:30 +0300, Andrey Gura пишет:
> > Nikolai,
> >
> > I'm trying to review this PR but it is too large.
> >
> > Could you please describe problem and design of implemented solution?
> > Also javadocs for base interfaces aren't clear, too brief and doesn't
> > give any imagine about whole picture.
> >
> > At present it is very hard to understand the purposes of new
> > interfaces and walker generator, and design itself.
> >
> > On Fri, Sep 6, 2019 at 3:16 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
> > >
> > > Hello, Igniters.
> > >
> > > IEP-35. Monitoring&Profiling. Phase2 is ready [1]
> > > Please, join to the review!
> > >
> > > I've implemented:
> > >
> > > * Monitoring list engine.
> > > * Following list implemented:
> > >     * Cache list
> > >     * Cache group list
> > >     * Compute task list
> > >     * Service list.
> > >
> > > Engine details:
> > >
> > > * `MonitoringList` added to store list data.
> > > * Base interface `MonitoringRow` for list data created.
> > > * Corresponding method added to `MetricExporterSpi`
> > > * `JmxMetricExporterSpi`, `SqlViewExporterSpi`, `LogExporterSpi` updated to
> > > support list export.
> > > * JMX, SQL and other column-oriented SPI uses
> > > `MonitoringRowAttributeWalker` to quickly traverse all list row attributes.
> > > * Implementation of `MonitoringRowAttributeWalkerfor specificMonitoringRow`
> > > can be generated with `MonitoringRowAttributeWalkerGenerator`
> > >
> > > I prepare follow-up PR [2], also.
> > > Following lists implemented:
> > >
> > > * SQL tables
> > > * SQL indexes
> > > * SQL schemas
> > > * SQL queries
> > > * Continuous queries
> > > * Text queries
> > > * Transactions
> > > * Cluster nodes
> > > * Client connections(JDBC, ODBC, Thin)
> > >
> > > [1] https://github.com/apache/ignite/pull/6845
> > > [2] https://github.com/apache/ignite/pull/6790
> > >
> > >
> > >
> > > пн, 10 июн. 2019 г. в 13:49, Nikolay Izhikov <nizhikov@apache.org>:
> > >
> > > > Hello, Igniters.
> > > >
> > > > Since Phase 1 will be merged in master soon I've created the ticket [1]
> > > > for Phase 2.
> > > >
> > > > Scope of Phase 2(copy-paste from the ticket)
> > > >
> > > > Ability to collect lists of some internal object Ignite manage.
> > > > Examples of such objects:
> > > >
> > > >   * Caches
> > > >   * Queries (including continuous queries)
> > > >   * Services
> > > >   * Compute tasks
> > > >   * Distributed Data Structures
> > > >   * etc...
> > > >
> > > >
> > > > 1. Fields for each list(that doesn't currently exists in Ignite) will
be
> > > > discussed in separate tickets
> > > > 2. Metric Exporters (optionally) can support list export.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-11905
> > > >
> > > >
> > > > В Вт, 14/05/2019 в 16:42 +0300, Nikolay Izhikov пишет:
> > > > > Ticket for IEP.Phase1 created -
> > > >
> > > > https://issues.apache.org/jira/browse/IGNITE-11848
> > > > >
> > > > >
> > > > > В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет:
> > > > > > Hello, Igniters.
> > > > > >
> > > > > > We have discussed this IEP [1] with Alexey Goncharyuk, Anton
> > > >
> > > > Vinogradov, Andrey Gura, Alexey Scherbakov and Pavel Kovalenko.
> > > > > >
> > > > > > Issues to address:
> > > > > >
> > > > > > 1. Study experience of following libs, tools:
> > > > > >     * OpenTracing
> > > > > >     * OpenSensus
> > > > > >     * DropWizard
> > > > > >
> > > > > > 2. Support histogram sensor: Sensor that collects values that
gets
> > > >
> > > > into predefined segments
> > > > > >
> > > > > > 3. Use more widely used naming(like in OpenSensus?)
> > > > > >
> > > > > > 4. Consider the usage of OpenSensus as a default implementation
for
> > > >
> > > > local metric storage.
> > > > > >
> > > > > > 5. To measure the performance penalty for metrics for 5_000
caches.
> > > > > >
> > > > > > 6. Some metrics should be part of public API and others are
not(may be
> > > >
> > > > changed/removed in release without warnings).
> > > > > >
> > > > > > My plan for Phase #1 is the following:
> > > > > >
> > > > > > 1. Address the issues.
> > > > > > 2. Prepare public API
> > > > > > 3. Prepare PR for monitoring subsystem + existing metrics rewritten
> > > >
> > > > with it.
> > > > > > 4. Prepare a PR with lists of each user API.
> > > > > > 5. Collect feedback for a #4.
> > > > > > 6. Design a log exposer. Consider the usage of JFR format or
some
> > > >
> > > > other widely used, tool compatible format.
> > > > > >
> > > > > > [1]
> > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > >
> > > > > > В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет:
> > > > > > > Hello, Maxim.
> > > > > > >
> > > > > > > > How will be recorded throughput sensor values which
will require
> > > >
> > > > an interval for the rate calculations?
> > > > > > >
> > > > > > > I answered to this question in IEP "Design principles":
> > > > > > >
> > > > > > > ```
> > > > > > > Sensors should contain only raw values. No aggregation
of numeric
> > > >
> > > > metrics on Ignite side.
> > > > > > > Min, max, avg and other functions are the matter of an
external
> > > >
> > > > monitoring system.
> > > > > > > ```
> > > > > > >
> > > > > > > Throughput is a function `(S(t2) - S(t1))/(t2-t1)`
> > > > > > > where S(t) is the sensor value in some point of time t.
> > > > > > >
> > > > > > > Seems, throughput calculation is a responsibility of an
external
> > > >
> > > > system.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > > It seems to me that we can add an additional parameter
of
> > > >
> > > > `sensitivityLevel` to provide for the user a flexible sensor control (e.g.,
> > > > INFO, WARN, NOTICE, DEBUG).
> > > > > > >
> > > > > > > For now, I think that all sensors and lists will be very(very!)
> > > >
> > > > lightweight.
> > > > > > > So, we should be able to disable/enable it's, for sure.
> > > > > > >
> > > > > > > But, we should turn off and turn on the whole Ignite subsystem
> > > > > > > for the case we have strong performance limitations for
a particular
> > > >
> > > > workload.
> > > > > > >
> > > > > > > So, we have two "level" of monitoring - INFO and DEBUG(for
> > > >
> > > > profiling: IEP-35 - Phase 3).
> > > > > > > For example, AFAIK we can't disable current SQL system
views(Why
> > > >
> > > > should we?)
> > > > > > >
> > > > > > > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет:
> > > > > > > > Hello Nikolay,
> > > > > > > >
> > > > > > > > I've looked through your PRs changes.
> > > > > > > >
> > > > > > > > > Sensors
> > > > > > > >
> > > > > > > > How will be recorded throughput sensor values which
will require an
> > > > > > > > interval for the rate calculations? Do we have such
an example? For
> > > > > > > > instance, getAllocationRate() or getEvictionRate().
These metrics
> > > >
> > > > are
> > > > > > > > out of the scope of current PoC and IEP as they are
not related to
> > > >
> > > > the
> > > > > > > > user metrics, but it is a good example of a particular
metric type.
> > > > > > > >
> > > > > > > > It seems to me that we can add an additional parameter
of
> > > > > > > > `sensitivityLevel` to provide for the user a flexible
sensor
> > > >
> > > > control
> > > > > > > > (e.g., INFO, WARN, NOTICE, DEBUG).
> > > > > > > >
> > > > > > > > It also seems that for the sensors getValue() the
completely
> > > > > > > > functional java approach can be used. Am I right?
> > > > > > > >
> > > > > > > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov <nizhikov@apache.org>
> > > >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Hello, Vyacheslav.
> > > > > > > > >
> > > > > > > > > Thanks for the feedback!
> > > > > > > > >
> > > > > > > > > > HttpExposer with Jetty's dependencies should
be detached> from
> > > >
> > > > the core module.
> > > > > > > > >
> > > > > > > > > Agreed. module hierarchy is the essence of the
next steps.
> > > > > > > > > For now it just a proof of my ideas for Ignite
monitoring we can
> > > >
> > > > discuss.
> > > > > > > > >
> > > > > > > > > > I like your approach with 'wrapper' for
monitored objects,
> > > >
> > > > like don't like using 'ServiceConfiguration' directly as a monitored object
> > > > for services
> > > > > > > > >
> > > > > > > > > Agreed in general.
> > > > > > > > > Seems, choosing the right data to expose is the
matter of
> > > >
> > > > separate discussion for each Ignite entities.
> > > > > > > > > I've planned to file tickets for each entity
so anyone
> > > >
> > > > interested can share his vision in it.
> > > > > > > > >
> > > > > > > > > > In my opinion, each sensor should have a
timestamp.
> > > > > > > > >
> > > > > > > > > I'm not sure that *every* sensor should have
directly associated
> > > >
> > > > timestamp.
> > > > > > > > > Seems, we should support sensors without timestamp
for a current
> > > >
> > > > monitoring numbers at least.
> > > > > > > > >
> > > > > > > > > > Also, it'd be great to have an ability to
store a list of a
> > > >
> > > > fixed size> of last N sensors
> > > > > > > > >
> > > > > > > > > What use-cases do you know for such sensors?
> > > > > > > > > We have plans to support fixed size lists to
show "Last N SQL
> > > >
> > > > queries" or similar data.
> > > > > > > > > Essentially, a sensor is just a single value
with the name and
> > > >
> > > > known meaning.
> > > > > > > > >
> > > > > > > > > > It'd be great if you provide a more extended
test to show the
> > > >
> > > > work of> the system.
> > > > > > > > >
> > > > > > > > > Sorry, for that :)
> > > > > > > > > When you run 'MonitoringSelfTest' you should
open
> > > >
> > > > http://localhost:8080/ignite/monitoring to view exposed info.
> > > > > > > > > I provide this info in gist -
> > > >
> > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > > > >
> > > > > > > > > I will extend this test to print results to console
in the next
> > > >
> > > > iterations - stay tuned :)
> > > > > > > > >
> > > > > > > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav
Daradur пишет:
> > > > > > > > > > Hi, Nikolay,
> > > > > > > > > >
> > > > > > > > > > I looked through PR and IEP, and I have
some comments:
> > > > > > > > > >
> > > > > > > > > > It would be better to implement it as a
separate module, I
> > > >
> > > > can't say
> > > > > > > > > > if it is possible for the main part of monitoring
or not, but I
> > > > > > > > > > believe that HttpExposer with Jetty's dependencies
should be
> > > >
> > > > detached
> > > > > > > > > > from the core module.
> > > > > > > > > >
> > > > > > > > > > I like your approach with 'wrapper' for
monitored objects, like
> > > > > > > > > > 'ComputeTaskInfo' in PR, and don't like
using
> > > >
> > > > 'ServiceConfiguration'
> > > > > > > > > > directly as a monitored object for services.
I believe we
> > > >
> > > > shouldn't
> > > > > > > > > > mix approaches. It'd be better always use
some kind of
> > > >
> > > > container with
> > > > > > > > > > monitored object's information to work with
such data.
> > > > > > > > > >
> > > > > > > > > > In my opinion, each sensor should have a
timestamp. Usually
> > > >
> > > > monitoring
> > > > > > > > > > systems aggregate data and build graphics
according to sensors
> > > > > > > > > > timestamp.
> > > > > > > > > >
> > > > > > > > > > Also, it'd be great to have an ability to
store a list of a
> > > >
> > > > fixed size
> > > > > > > > > > of last N sensors, not to miss them without
pushing to an
> > > >
> > > > external
> > > > > > > > > > monitoring system.
> > > > > > > > > >
> > > > > > > > > > It'd be great if you provide a more extended
test to show the
> > > >
> > > > work of
> > > > > > > > > > the system. Everybody who looks to PR needs
to run the test
> > > >
> > > > and get
> > > > > > > > > > the info manually to see the completeness
of sensors, this
> > > >
> > > > might be
> > > > > > > > > > simplified by proper test.
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay
Izhikov <
> > > >
> > > > nizhikov@apache.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hello, Igniters.
> > > > > > > > > > >
> > > > > > > > > > > I've prepared Proof of Concept for
IEP-35 [1]
> > > > > > > > > > > PR can be found here -
> > > >
> > > > https://github.com/apache/ignite/pull/6510
> > > > > > > > > > >
> > > > > > > > > > > I've done following changes:
> > > > > > > > > > >
> > > > > > > > > > >         1. `GridMonitoringManager`
 [2] - simple
> > > >
> > > > implementation of manager to store all monitoring info
> > > > > > > > > > >         2. `HttpPullExposerSpi` [3]
- pull exposer
> > > >
> > > > implementation that can respond with JSON from
> > > > http://localhost:8080/ignite/monitoring. JSON content can be veiwed in
> > > > gist [4]
> > > > > > > > > > >         3. Compute task start and finish
monitoring in
> > > >
> > > > "compute" list [5]
> > > > > > > > > > >         4. Service registration are
monitored in "service"
> > > >
> > > > list - [6]
> > > > > > > > > > >         5. Current `IgniteSpiMBeanAdapter`
rewritten using
> > > >
> > > > `GridMonitoringManager` [7]
> > > > > > > > > > >
> > > > > > > > > > > Design principles, monitoring subsystem
details and new
> > > >
> > > > Ignite entities can be found in IEP [1].
> > > > > > > > > > >
> > > > > > > > > > > My next steps will be:
> > > > > > > > > > >
> > > > > > > > > > >         1. Implementation of JMX exposer
> > > > > > > > > > >         2. Registration of all "lists"
and "sensor groups"
> > > >
> > > > as a SQL System view.
> > > > > > > > > > >         3. Add monitoring for all unmonitoring
Ignite API.
> > > >
> > > > (described in IEP).
> > > > > > > > > > >         4. Rewrite existing jmx metrics
using
> > > >
> > > > GridMonitoringManager.
> > > > > > > > > > >
> > > > > > > > > > > Please, share you thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Part of JSON file:
> > > > > > > > > > > ```
> > > > > > > > > > >     "COMPUTE": {
> > > > > > > > > > >       "tasks": {
> > > > > > > > > > >         "name": "tasks",
> > > > > > > > > > >         "rows": [
> > > > > > > > > > >           {
> > > > > > > > > > >             "id": "0798817a-eeec-4386-9af7-94edb39ffced",
> > > > > > > > > > >             "sessionId":
> > > >
> > > > "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000",
> > > > > > > > > > >             "data": {
> > > > > > > > > > >               "taskClasName":
> > > >
> > > > "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480",
> > > > > > > > > > >               "startTime": 1556287337944,
> > > > > > > > > > >               "timeout": 9223372036854776000,
> > > > > > > > > > >               "execName": null
> > > > > > > > > > >             },
> > > > > > > > > > >             "name": "anotherBroadcast"
> > > > > > > > > > >           }
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > > > > > > [2]
> > > >
> > > > https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34
> > > > > > > > > > > [3]
> > > >
> > > > https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47
> > > > > > > > > > > [4]
> > > >
> > > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > > > > > > [5]
> > > >
> > > > https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749
> > > > > > > > > > > [6]
> > > >
> > > > https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606
> > > > > > > > > > > [7]
> > > >
> > > > https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >

Mime
View raw message