ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: [IEP-35] Monitoring & Profiling. Phase 2
Date Wed, 11 Sep 2019 13:30:11 GMT
Nikolai,

I'm trying to review this PR but it is too large.

Could you please describe problem and design of implemented solution?
Also javadocs for base interfaces aren't clear, too brief and doesn't
give any imagine about whole picture.

At present it is very hard to understand the purposes of new
interfaces and walker generator, and design itself.

On Fri, Sep 6, 2019 at 3:16 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
>
> Hello, Igniters.
>
> IEP-35. Monitoring&Profiling. Phase2 is ready [1]
> Please, join to the review!
>
> I've implemented:
>
> * Monitoring list engine.
> * Following list implemented:
>     * Cache list
>     * Cache group list
>     * Compute task list
>     * Service list.
>
> Engine details:
>
> * `MonitoringList` added to store list data.
> * Base interface `MonitoringRow` for list data created.
> * Corresponding method added to `MetricExporterSpi`
> * `JmxMetricExporterSpi`, `SqlViewExporterSpi`, `LogExporterSpi` updated to
> support list export.
> * JMX, SQL and other column-oriented SPI uses
> `MonitoringRowAttributeWalker` to quickly traverse all list row attributes.
> * Implementation of `MonitoringRowAttributeWalkerfor specificMonitoringRow`
> can be generated with `MonitoringRowAttributeWalkerGenerator`
>
> I prepare follow-up PR [2], also.
> Following lists implemented:
>
> * SQL tables
> * SQL indexes
> * SQL schemas
> * SQL queries
> * Continuous queries
> * Text queries
> * Transactions
> * Cluster nodes
> * Client connections(JDBC, ODBC, Thin)
>
> [1] https://github.com/apache/ignite/pull/6845
> [2] https://github.com/apache/ignite/pull/6790
>
>
>
> пн, 10 июн. 2019 г. в 13:49, Nikolay Izhikov <nizhikov@apache.org>:
>
> > Hello, Igniters.
> >
> > Since Phase 1 will be merged in master soon I've created the ticket [1]
> > for Phase 2.
> >
> > Scope of Phase 2(copy-paste from the ticket)
> >
> > Ability to collect lists of some internal object Ignite manage.
> > Examples of such objects:
> >
> >   * Caches
> >   * Queries (including continuous queries)
> >   * Services
> >   * Compute tasks
> >   * Distributed Data Structures
> >   * etc...
> >
> >
> > 1. Fields for each list(that doesn't currently exists in Ignite) will be
> > discussed in separate tickets
> > 2. Metric Exporters (optionally) can support list export.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11905
> >
> >
> > В Вт, 14/05/2019 в 16:42 +0300, Nikolay Izhikov пишет:
> > > Ticket for IEP.Phase1 created -
> > https://issues.apache.org/jira/browse/IGNITE-11848
> > >
> > >
> > > В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет:
> > > > Hello, Igniters.
> > > >
> > > > We have discussed this IEP [1] with Alexey Goncharyuk, Anton
> > Vinogradov, Andrey Gura, Alexey Scherbakov and Pavel Kovalenko.
> > > >
> > > > Issues to address:
> > > >
> > > > 1. Study experience of following libs, tools:
> > > >     * OpenTracing
> > > >     * OpenSensus
> > > >     * DropWizard
> > > >
> > > > 2. Support histogram sensor: Sensor that collects values that gets
> > into predefined segments
> > > >
> > > > 3. Use more widely used naming(like in OpenSensus?)
> > > >
> > > > 4. Consider the usage of OpenSensus as a default implementation for
> > local metric storage.
> > > >
> > > > 5. To measure the performance penalty for metrics for 5_000 caches.
> > > >
> > > > 6. Some metrics should be part of public API and others are not(may be
> > changed/removed in release without warnings).
> > > >
> > > > My plan for Phase #1 is the following:
> > > >
> > > > 1. Address the issues.
> > > > 2. Prepare public API
> > > > 3. Prepare PR for monitoring subsystem + existing metrics rewritten
> > with it.
> > > > 4. Prepare a PR with lists of each user API.
> > > > 5. Collect feedback for a #4.
> > > > 6. Design a log exposer. Consider the usage of JFR format or some
> > other widely used, tool compatible format.
> > > >
> > > > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > >
> > > > В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет:
> > > > > Hello, Maxim.
> > > > >
> > > > > > How will be recorded throughput sensor values which will require
> > an interval for the rate calculations?
> > > > >
> > > > > I answered to this question in IEP "Design principles":
> > > > >
> > > > > ```
> > > > > Sensors should contain only raw values. No aggregation of numeric
> > metrics on Ignite side.
> > > > > Min, max, avg and other functions are the matter of an external
> > monitoring system.
> > > > > ```
> > > > >
> > > > > Throughput is a function `(S(t2) - S(t1))/(t2-t1)`
> > > > > where S(t) is the sensor value in some point of time t.
> > > > >
> > > > > Seems, throughput calculation is a responsibility of an external
> > system.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > > It seems to me that we can add an additional parameter of
> > `sensitivityLevel` to provide for the user a flexible sensor control (e.g.,
> > INFO, WARN, NOTICE, DEBUG).
> > > > >
> > > > > For now, I think that all sensors and lists will be very(very!)
> > lightweight.
> > > > > So, we should be able to disable/enable it's, for sure.
> > > > >
> > > > > But, we should turn off and turn on the whole Ignite subsystem
> > > > > for the case we have strong performance limitations for a particular
> > workload.
> > > > >
> > > > > So, we have two "level" of monitoring - INFO and DEBUG(for
> > profiling: IEP-35 - Phase 3).
> > > > > For example, AFAIK we can't disable current SQL system views(Why
> > should we?)
> > > > >
> > > > > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет:
> > > > > > Hello Nikolay,
> > > > > >
> > > > > > I've looked through your PRs changes.
> > > > > >
> > > > > > > Sensors
> > > > > >
> > > > > > How will be recorded throughput sensor values which will require
an
> > > > > > interval for the rate calculations? Do we have such an example?
For
> > > > > > instance, getAllocationRate() or getEvictionRate(). These metrics
> > are
> > > > > > out of the scope of current PoC and IEP as they are not related
to
> > the
> > > > > > user metrics, but it is a good example of a particular metric
type.
> > > > > >
> > > > > > It seems to me that we can add an additional parameter of
> > > > > > `sensitivityLevel` to provide for the user a flexible sensor
> > control
> > > > > > (e.g., INFO, WARN, NOTICE, DEBUG).
> > > > > >
> > > > > > It also seems that for the sensors getValue() the completely
> > > > > > functional java approach can be used. Am I right?
> > > > > >
> > > > > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov <nizhikov@apache.org>
> > wrote:
> > > > > > >
> > > > > > > Hello, Vyacheslav.
> > > > > > >
> > > > > > > Thanks for the feedback!
> > > > > > >
> > > > > > > > HttpExposer with Jetty's dependencies should be detached>
from
> > the core module.
> > > > > > >
> > > > > > > Agreed. module hierarchy is the essence of the next steps.
> > > > > > > For now it just a proof of my ideas for Ignite monitoring
we can
> > discuss.
> > > > > > >
> > > > > > > > I like your approach with 'wrapper' for monitored
objects,
> > like don't like using 'ServiceConfiguration' directly as a monitored object
> > for services
> > > > > > >
> > > > > > > Agreed in general.
> > > > > > > Seems, choosing the right data to expose is the matter
of
> > separate discussion for each Ignite entities.
> > > > > > > I've planned to file tickets for each entity so anyone
> > interested can share his vision in it.
> > > > > > >
> > > > > > > > In my opinion, each sensor should have a timestamp.
> > > > > > >
> > > > > > > I'm not sure that *every* sensor should have directly associated
> > timestamp.
> > > > > > > Seems, we should support sensors without timestamp for
a current
> > monitoring numbers at least.
> > > > > > >
> > > > > > > > Also, it'd be great to have an ability to store a
list of a
> > fixed size> of last N sensors
> > > > > > >
> > > > > > > What use-cases do you know for such sensors?
> > > > > > > We have plans to support fixed size lists to show "Last
N SQL
> > queries" or similar data.
> > > > > > > Essentially, a sensor is just a single value with the name
and
> > known meaning.
> > > > > > >
> > > > > > > > It'd be great if you provide a more extended test
to show the
> > work of> the system.
> > > > > > >
> > > > > > > Sorry, for that :)
> > > > > > > When you run 'MonitoringSelfTest' you should open
> > http://localhost:8080/ignite/monitoring to view exposed info.
> > > > > > > I provide this info in gist -
> > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > >
> > > > > > > I will extend this test to print results to console in
the next
> > iterations - stay tuned :)
> > > > > > >
> > > > > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav Daradur
пишет:
> > > > > > > > Hi, Nikolay,
> > > > > > > >
> > > > > > > > I looked through PR and IEP, and I have some comments:
> > > > > > > >
> > > > > > > > It would be better to implement it as a separate module,
I
> > can't say
> > > > > > > > if it is possible for the main part of monitoring
or not, but I
> > > > > > > > believe that HttpExposer with Jetty's dependencies
should be
> > detached
> > > > > > > > from the core module.
> > > > > > > >
> > > > > > > > I like your approach with 'wrapper' for monitored
objects, like
> > > > > > > > 'ComputeTaskInfo' in PR, and don't like using
> > 'ServiceConfiguration'
> > > > > > > > directly as a monitored object for services. I believe
we
> > shouldn't
> > > > > > > > mix approaches. It'd be better always use some kind
of
> > container with
> > > > > > > > monitored object's information to work with such data.
> > > > > > > >
> > > > > > > > In my opinion, each sensor should have a timestamp.
Usually
> > monitoring
> > > > > > > > systems aggregate data and build graphics according
to sensors
> > > > > > > > timestamp.
> > > > > > > >
> > > > > > > > Also, it'd be great to have an ability to store a
list of a
> > fixed size
> > > > > > > > of last N sensors, not to miss them without pushing
to an
> > external
> > > > > > > > monitoring system.
> > > > > > > >
> > > > > > > > It'd be great if you provide a more extended test
to show the
> > work of
> > > > > > > > the system. Everybody who looks to PR needs to run
the test
> > and get
> > > > > > > > the info manually to see the completeness of sensors,
this
> > might be
> > > > > > > > simplified by proper test.
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov <
> > nizhikov@apache.org> wrote:
> > > > > > > > >
> > > > > > > > > Hello, Igniters.
> > > > > > > > >
> > > > > > > > > I've prepared Proof of Concept for IEP-35 [1]
> > > > > > > > > PR can be found here -
> > https://github.com/apache/ignite/pull/6510
> > > > > > > > >
> > > > > > > > > I've done following changes:
> > > > > > > > >
> > > > > > > > >         1. `GridMonitoringManager`  [2] - simple
> > implementation of manager to store all monitoring info
> > > > > > > > >         2. `HttpPullExposerSpi` [3] - pull exposer
> > implementation that can respond with JSON from
> > http://localhost:8080/ignite/monitoring. JSON content can be veiwed in
> > gist [4]
> > > > > > > > >         3. Compute task start and finish monitoring
in
> > "compute" list [5]
> > > > > > > > >         4. Service registration are monitored
in "service"
> > list - [6]
> > > > > > > > >         5. Current `IgniteSpiMBeanAdapter` rewritten
using
> > `GridMonitoringManager` [7]
> > > > > > > > >
> > > > > > > > > Design principles, monitoring subsystem details
and new
> > Ignite entities can be found in IEP [1].
> > > > > > > > >
> > > > > > > > > My next steps will be:
> > > > > > > > >
> > > > > > > > >         1. Implementation of JMX exposer
> > > > > > > > >         2. Registration of all "lists" and "sensor
groups"
> > as a SQL System view.
> > > > > > > > >         3. Add monitoring for all unmonitoring
Ignite API.
> > (described in IEP).
> > > > > > > > >         4. Rewrite existing jmx metrics using
> > GridMonitoringManager.
> > > > > > > > >
> > > > > > > > > Please, share you thoughts.
> > > > > > > > >
> > > > > > > > > Part of JSON file:
> > > > > > > > > ```
> > > > > > > > >     "COMPUTE": {
> > > > > > > > >       "tasks": {
> > > > > > > > >         "name": "tasks",
> > > > > > > > >         "rows": [
> > > > > > > > >           {
> > > > > > > > >             "id": "0798817a-eeec-4386-9af7-94edb39ffced",
> > > > > > > > >             "sessionId":
> > "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000",
> > > > > > > > >             "data": {
> > > > > > > > >               "taskClasName":
> > "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480",
> > > > > > > > >               "startTime": 1556287337944,
> > > > > > > > >               "timeout": 9223372036854776000,
> > > > > > > > >               "execName": null
> > > > > > > > >             },
> > > > > > > > >             "name": "anotherBroadcast"
> > > > > > > > >           }
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > > > > [2]
> > https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34
> > > > > > > > > [3]
> > https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47
> > > > > > > > > [4]
> > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > > > > [5]
> > https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749
> > > > > > > > > [6]
> > https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606
> > > > > > > > > [7]
> > https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61
> > > > > > > >
> > > > > > > >
> > > > > > > >
> >

Mime
View raw message