ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: [IEP-35] Monitoring & Profiling. Phase 2
Date Wed, 11 Sep 2019 15:28:01 GMT
Hello, Andrey.

Thanks, for joining the review.

Basic interface for objects list is `MonitoringList`. It provides the following features:
	* name.
	* description.
	* row class.
	* size.
	* iterator for the list content.
	* attribute walker (described below).

`MonitoringRow` is a marker interface for classes that can be used as a monitoring list content.

Internally, there is only one implementation of `MonitoringList`, for now, `MonitoringListAdapter`.
It adapts the content of some `ConcurrentMap` which uses widely in Ignite internals.
I think, will be another implementation in the follow-up PRs.

Public API changes:

* New registry created `ReadOnlyMonitoringListRegistry` It provides access:
	* To all lists that exist in the Ignite.
	* Ability to subscribe to the list creation/removal events.

* `MetricExporterSpi` changes:
	* `setMonitoringListRegistry` method added
	* `setMonitoringListExportFilter` method added.

`MonitoringRowAttributeWalker` is a helper class for exporter implementations.
Usually, exporter SPI iterates on `MonitoringRow` attributes.
`SqlViewExporterSpi`, `JmxMetricExporterSpi` can be taken as an example.
It can be implemented with Java reflection API, but I use more quick approach.
`MonitoringRowAttributeWalker` can visit each attribute of the MonitoringRow implementation.
It's also, preserves, the order provided by the MonitoringRow implementation author.
It provides 2 main methods:
	* `visitAll(AttributeVisitor visitor);` - visits each attribute of the some monitoring row
class. Provides index, name and class of attribute to the consumer.
	* `visitAll(R row, AttributeWithValueVisitor visitor)` - visits each attribute of some monitoring
row instance. Provides index, name, class, value of attribute to the consumer.


В Ср, 11/09/2019 в 16:30 +0300, Andrey Gura пишет:
> Nikolai,
> 
> I'm trying to review this PR but it is too large.
> 
> Could you please describe problem and design of implemented solution?
> Also javadocs for base interfaces aren't clear, too brief and doesn't
> give any imagine about whole picture.
> 
> At present it is very hard to understand the purposes of new
> interfaces and walker generator, and design itself.
> 
> On Fri, Sep 6, 2019 at 3:16 PM Nikolay Izhikov <nizhikov@apache.org> wrote:
> > 
> > Hello, Igniters.
> > 
> > IEP-35. Monitoring&Profiling. Phase2 is ready [1]
> > Please, join to the review!
> > 
> > I've implemented:
> > 
> > * Monitoring list engine.
> > * Following list implemented:
> >     * Cache list
> >     * Cache group list
> >     * Compute task list
> >     * Service list.
> > 
> > Engine details:
> > 
> > * `MonitoringList` added to store list data.
> > * Base interface `MonitoringRow` for list data created.
> > * Corresponding method added to `MetricExporterSpi`
> > * `JmxMetricExporterSpi`, `SqlViewExporterSpi`, `LogExporterSpi` updated to
> > support list export.
> > * JMX, SQL and other column-oriented SPI uses
> > `MonitoringRowAttributeWalker` to quickly traverse all list row attributes.
> > * Implementation of `MonitoringRowAttributeWalkerfor specificMonitoringRow`
> > can be generated with `MonitoringRowAttributeWalkerGenerator`
> > 
> > I prepare follow-up PR [2], also.
> > Following lists implemented:
> > 
> > * SQL tables
> > * SQL indexes
> > * SQL schemas
> > * SQL queries
> > * Continuous queries
> > * Text queries
> > * Transactions
> > * Cluster nodes
> > * Client connections(JDBC, ODBC, Thin)
> > 
> > [1] https://github.com/apache/ignite/pull/6845
> > [2] https://github.com/apache/ignite/pull/6790
> > 
> > 
> > 
> > пн, 10 июн. 2019 г. в 13:49, Nikolay Izhikov <nizhikov@apache.org>:
> > 
> > > Hello, Igniters.
> > > 
> > > Since Phase 1 will be merged in master soon I've created the ticket [1]
> > > for Phase 2.
> > > 
> > > Scope of Phase 2(copy-paste from the ticket)
> > > 
> > > Ability to collect lists of some internal object Ignite manage.
> > > Examples of such objects:
> > > 
> > >   * Caches
> > >   * Queries (including continuous queries)
> > >   * Services
> > >   * Compute tasks
> > >   * Distributed Data Structures
> > >   * etc...
> > > 
> > > 
> > > 1. Fields for each list(that doesn't currently exists in Ignite) will be
> > > discussed in separate tickets
> > > 2. Metric Exporters (optionally) can support list export.
> > > 
> > > [1] https://issues.apache.org/jira/browse/IGNITE-11905
> > > 
> > > 
> > > В Вт, 14/05/2019 в 16:42 +0300, Nikolay Izhikov пишет:
> > > > Ticket for IEP.Phase1 created -
> > > 
> > > https://issues.apache.org/jira/browse/IGNITE-11848
> > > > 
> > > > 
> > > > В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет:
> > > > > Hello, Igniters.
> > > > > 
> > > > > We have discussed this IEP [1] with Alexey Goncharyuk, Anton
> > > 
> > > Vinogradov, Andrey Gura, Alexey Scherbakov and Pavel Kovalenko.
> > > > > 
> > > > > Issues to address:
> > > > > 
> > > > > 1. Study experience of following libs, tools:
> > > > >     * OpenTracing
> > > > >     * OpenSensus
> > > > >     * DropWizard
> > > > > 
> > > > > 2. Support histogram sensor: Sensor that collects values that gets
> > > 
> > > into predefined segments
> > > > > 
> > > > > 3. Use more widely used naming(like in OpenSensus?)
> > > > > 
> > > > > 4. Consider the usage of OpenSensus as a default implementation for
> > > 
> > > local metric storage.
> > > > > 
> > > > > 5. To measure the performance penalty for metrics for 5_000 caches.
> > > > > 
> > > > > 6. Some metrics should be part of public API and others are not(may
be
> > > 
> > > changed/removed in release without warnings).
> > > > > 
> > > > > My plan for Phase #1 is the following:
> > > > > 
> > > > > 1. Address the issues.
> > > > > 2. Prepare public API
> > > > > 3. Prepare PR for monitoring subsystem + existing metrics rewritten
> > > 
> > > with it.
> > > > > 4. Prepare a PR with lists of each user API.
> > > > > 5. Collect feedback for a #4.
> > > > > 6. Design a log exposer. Consider the usage of JFR format or some
> > > 
> > > other widely used, tool compatible format.
> > > > > 
> > > > > [1]
> > > 
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > 
> > > > > В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет:
> > > > > > Hello, Maxim.
> > > > > > 
> > > > > > > How will be recorded throughput sensor values which will
require
> > > 
> > > an interval for the rate calculations?
> > > > > > 
> > > > > > I answered to this question in IEP "Design principles":
> > > > > > 
> > > > > > ```
> > > > > > Sensors should contain only raw values. No aggregation of numeric
> > > 
> > > metrics on Ignite side.
> > > > > > Min, max, avg and other functions are the matter of an external
> > > 
> > > monitoring system.
> > > > > > ```
> > > > > > 
> > > > > > Throughput is a function `(S(t2) - S(t1))/(t2-t1)`
> > > > > > where S(t) is the sensor value in some point of time t.
> > > > > > 
> > > > > > Seems, throughput calculation is a responsibility of an external
> > > 
> > > system.
> > > > > > 
> > > > > > What do you think?
> > > > > > 
> > > > > > > It seems to me that we can add an additional parameter
of
> > > 
> > > `sensitivityLevel` to provide for the user a flexible sensor control (e.g.,
> > > INFO, WARN, NOTICE, DEBUG).
> > > > > > 
> > > > > > For now, I think that all sensors and lists will be very(very!)
> > > 
> > > lightweight.
> > > > > > So, we should be able to disable/enable it's, for sure.
> > > > > > 
> > > > > > But, we should turn off and turn on the whole Ignite subsystem
> > > > > > for the case we have strong performance limitations for a particular
> > > 
> > > workload.
> > > > > > 
> > > > > > So, we have two "level" of monitoring - INFO and DEBUG(for
> > > 
> > > profiling: IEP-35 - Phase 3).
> > > > > > For example, AFAIK we can't disable current SQL system views(Why
> > > 
> > > should we?)
> > > > > > 
> > > > > > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет:
> > > > > > > Hello Nikolay,
> > > > > > > 
> > > > > > > I've looked through your PRs changes.
> > > > > > > 
> > > > > > > > Sensors
> > > > > > > 
> > > > > > > How will be recorded throughput sensor values which will
require an
> > > > > > > interval for the rate calculations? Do we have such an
example? For
> > > > > > > instance, getAllocationRate() or getEvictionRate(). These
metrics
> > > 
> > > are
> > > > > > > out of the scope of current PoC and IEP as they are not
related to
> > > 
> > > the
> > > > > > > user metrics, but it is a good example of a particular
metric type.
> > > > > > > 
> > > > > > > It seems to me that we can add an additional parameter
of
> > > > > > > `sensitivityLevel` to provide for the user a flexible sensor
> > > 
> > > control
> > > > > > > (e.g., INFO, WARN, NOTICE, DEBUG).
> > > > > > > 
> > > > > > > It also seems that for the sensors getValue() the completely
> > > > > > > functional java approach can be used. Am I right?
> > > > > > > 
> > > > > > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov <nizhikov@apache.org>
> > > 
> > > wrote:
> > > > > > > > 
> > > > > > > > Hello, Vyacheslav.
> > > > > > > > 
> > > > > > > > Thanks for the feedback!
> > > > > > > > 
> > > > > > > > > HttpExposer with Jetty's dependencies should
be detached> from
> > > 
> > > the core module.
> > > > > > > > 
> > > > > > > > Agreed. module hierarchy is the essence of the next
steps.
> > > > > > > > For now it just a proof of my ideas for Ignite monitoring
we can
> > > 
> > > discuss.
> > > > > > > > 
> > > > > > > > > I like your approach with 'wrapper' for monitored
objects,
> > > 
> > > like don't like using 'ServiceConfiguration' directly as a monitored object
> > > for services
> > > > > > > > 
> > > > > > > > Agreed in general.
> > > > > > > > Seems, choosing the right data to expose is the matter
of
> > > 
> > > separate discussion for each Ignite entities.
> > > > > > > > I've planned to file tickets for each entity so anyone
> > > 
> > > interested can share his vision in it.
> > > > > > > > 
> > > > > > > > > In my opinion, each sensor should have a timestamp.
> > > > > > > > 
> > > > > > > > I'm not sure that *every* sensor should have directly
associated
> > > 
> > > timestamp.
> > > > > > > > Seems, we should support sensors without timestamp
for a current
> > > 
> > > monitoring numbers at least.
> > > > > > > > 
> > > > > > > > > Also, it'd be great to have an ability to store
a list of a
> > > 
> > > fixed size> of last N sensors
> > > > > > > > 
> > > > > > > > What use-cases do you know for such sensors?
> > > > > > > > We have plans to support fixed size lists to show
"Last N SQL
> > > 
> > > queries" or similar data.
> > > > > > > > Essentially, a sensor is just a single value with
the name and
> > > 
> > > known meaning.
> > > > > > > > 
> > > > > > > > > It'd be great if you provide a more extended
test to show the
> > > 
> > > work of> the system.
> > > > > > > > 
> > > > > > > > Sorry, for that :)
> > > > > > > > When you run 'MonitoringSelfTest' you should open
> > > 
> > > http://localhost:8080/ignite/monitoring to view exposed info.
> > > > > > > > I provide this info in gist -
> > > 
> > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > > > 
> > > > > > > > I will extend this test to print results to console
in the next
> > > 
> > > iterations - stay tuned :)
> > > > > > > > 
> > > > > > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav Daradur
пишет:
> > > > > > > > > Hi, Nikolay,
> > > > > > > > > 
> > > > > > > > > I looked through PR and IEP, and I have some
comments:
> > > > > > > > > 
> > > > > > > > > It would be better to implement it as a separate
module, I
> > > 
> > > can't say
> > > > > > > > > if it is possible for the main part of monitoring
or not, but I
> > > > > > > > > believe that HttpExposer with Jetty's dependencies
should be
> > > 
> > > detached
> > > > > > > > > from the core module.
> > > > > > > > > 
> > > > > > > > > I like your approach with 'wrapper' for monitored
objects, like
> > > > > > > > > 'ComputeTaskInfo' in PR, and don't like using
> > > 
> > > 'ServiceConfiguration'
> > > > > > > > > directly as a monitored object for services.
I believe we
> > > 
> > > shouldn't
> > > > > > > > > mix approaches. It'd be better always use some
kind of
> > > 
> > > container with
> > > > > > > > > monitored object's information to work with such
data.
> > > > > > > > > 
> > > > > > > > > In my opinion, each sensor should have a timestamp.
Usually
> > > 
> > > monitoring
> > > > > > > > > systems aggregate data and build graphics according
to sensors
> > > > > > > > > timestamp.
> > > > > > > > > 
> > > > > > > > > Also, it'd be great to have an ability to store
a list of a
> > > 
> > > fixed size
> > > > > > > > > of last N sensors, not to miss them without pushing
to an
> > > 
> > > external
> > > > > > > > > monitoring system.
> > > > > > > > > 
> > > > > > > > > It'd be great if you provide a more extended
test to show the
> > > 
> > > work of
> > > > > > > > > the system. Everybody who looks to PR needs to
run the test
> > > 
> > > and get
> > > > > > > > > the info manually to see the completeness of
sensors, this
> > > 
> > > might be
> > > > > > > > > simplified by proper test.
> > > > > > > > > 
> > > > > > > > > Thank you!
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov
<
> > > 
> > > nizhikov@apache.org> wrote:
> > > > > > > > > > 
> > > > > > > > > > Hello, Igniters.
> > > > > > > > > > 
> > > > > > > > > > I've prepared Proof of Concept for IEP-35
[1]
> > > > > > > > > > PR can be found here -
> > > 
> > > https://github.com/apache/ignite/pull/6510
> > > > > > > > > > 
> > > > > > > > > > I've done following changes:
> > > > > > > > > > 
> > > > > > > > > >         1. `GridMonitoringManager`  [2]
- simple
> > > 
> > > implementation of manager to store all monitoring info
> > > > > > > > > >         2. `HttpPullExposerSpi` [3] - pull
exposer
> > > 
> > > implementation that can respond with JSON from
> > > http://localhost:8080/ignite/monitoring. JSON content can be veiwed in
> > > gist [4]
> > > > > > > > > >         3. Compute task start and finish
monitoring in
> > > 
> > > "compute" list [5]
> > > > > > > > > >         4. Service registration are monitored
in "service"
> > > 
> > > list - [6]
> > > > > > > > > >         5. Current `IgniteSpiMBeanAdapter`
rewritten using
> > > 
> > > `GridMonitoringManager` [7]
> > > > > > > > > > 
> > > > > > > > > > Design principles, monitoring subsystem
details and new
> > > 
> > > Ignite entities can be found in IEP [1].
> > > > > > > > > > 
> > > > > > > > > > My next steps will be:
> > > > > > > > > > 
> > > > > > > > > >         1. Implementation of JMX exposer
> > > > > > > > > >         2. Registration of all "lists" and
"sensor groups"
> > > 
> > > as a SQL System view.
> > > > > > > > > >         3. Add monitoring for all unmonitoring
Ignite API.
> > > 
> > > (described in IEP).
> > > > > > > > > >         4. Rewrite existing jmx metrics
using
> > > 
> > > GridMonitoringManager.
> > > > > > > > > > 
> > > > > > > > > > Please, share you thoughts.
> > > > > > > > > > 
> > > > > > > > > > Part of JSON file:
> > > > > > > > > > ```
> > > > > > > > > >     "COMPUTE": {
> > > > > > > > > >       "tasks": {
> > > > > > > > > >         "name": "tasks",
> > > > > > > > > >         "rows": [
> > > > > > > > > >           {
> > > > > > > > > >             "id": "0798817a-eeec-4386-9af7-94edb39ffced",
> > > > > > > > > >             "sessionId":
> > > 
> > > "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000",
> > > > > > > > > >             "data": {
> > > > > > > > > >               "taskClasName":
> > > 
> > > "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480",
> > > > > > > > > >               "startTime": 1556287337944,
> > > > > > > > > >               "timeout": 9223372036854776000,
> > > > > > > > > >               "execName": null
> > > > > > > > > >             },
> > > > > > > > > >             "name": "anotherBroadcast"
> > > > > > > > > >           }
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > [1]
> > > 
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > > > > > [2]
> > > 
> > > https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34
> > > > > > > > > > [3]
> > > 
> > > https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47
> > > > > > > > > > [4]
> > > 
> > > https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > > > > > [5]
> > > 
> > > https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749
> > > > > > > > > > [6]
> > > 
> > > https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606
> > > > > > > > > > [7]
> > > 
> > > https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 

Mime
View raw message