From dev-return-45968-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Mon May 13 15:14:14 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 875BC180671 for ; Mon, 13 May 2019 17:14:13 +0200 (CEST) Received: (qmail 54521 invoked by uid 500); 13 May 2019 15:14:12 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 54499 invoked by uid 99); 13 May 2019 15:14:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2019 15:14:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 98915C2559 for ; Mon, 13 May 2019 15:14:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xGeLpgXOgvet for ; Mon, 13 May 2019 15:14:09 +0000 (UTC) Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E79736122A for ; Mon, 13 May 2019 15:04:02 +0000 (UTC) Received: by mail-lf1-f67.google.com with SMTP id y19so9326783lfy.5 for ; Mon, 13 May 2019 08:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:message-id:subject:to:date:in-reply-to:references :mime-version; bh=Bm7xLX1F30P6z98Wwyv8gkZQ8eCH37xEhnE4gHl1Qeg=; b=srugORnMgceTcQljvqa9emTwqN5DEG6IlHiCrjjxWQuJ8f7Qdxb+XFHtv8dsujkIa/ K/Lf1QJlp7La8/qPK1UoFpDQqxFeu7xjJA3o6mGAa8G6rbK3TSkYAmSfb2JKa/sT916b jao+kzieR+2yk4onYC8hR3rGFLyGFTqBxFC2QUfNg2XzRnCY7gNcK53Hmg73JpDuiRCU V1tLk3ns8fA+UGwq4OTvX9sB39Jftt/AKPrmy5w0yVT03o0Y/zxjaJr0osmVZzuxFaWi YRhM56fJqK0JxHUATyL6L4NNWmuLfl57bQ1vT/By3KUfDPT0jsvD2mYGa6mr01LPwFTW d8Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:message-id:subject:to:date :in-reply-to:references:mime-version; bh=Bm7xLX1F30P6z98Wwyv8gkZQ8eCH37xEhnE4gHl1Qeg=; b=QadI7Qx1J71PGl2FjzujlQLYJYi2Wh8TVeVHs6jraMZYK8xFs+qHpjXzxzQ/clnXk6 3ekPZMCyz2yPoNc+qdCJJO7GlCn7zNOQbcHFpM4Vv/mbRqA/QaPs9iGk7cjJ7mGYLelb x1qkKLTRtMWbM3w44af/8gvMb7ehPlrC3Hb/c6Udgb3CNYICj1HmYfYh+HxX1H+MXCnZ DYapbf7RKIArt0CvMQlpjikLb4jv2iiaYmZmALwFFG49FrSH2OSlVc5FtDmkaL3ghhew /NTxDiKb7YQwQgsS610TlNWM9wTTmlYjK31K/o2d/g1ncPrIDqDXZmJyl+fbV7vjzKXa t2lA== X-Gm-Message-State: APjAAAUfKbxYO4Sac7HiNhuwELZjURmPGCBffAV8Ru0Wb1ZtYX7OHHvd bWTNUkgGvk5u3arCWBxs9x0kOb7B X-Google-Smtp-Source: APXvYqzs3e+L8h+XGmSuNmbeXNp7pvcp0plL8TVa8O4RVLgNI29lHyRfdm83C9g/Xk/Xw3n06SrqTw== X-Received: by 2002:a19:81c7:: with SMTP id c190mr557921lfd.55.1557759842072; Mon, 13 May 2019 08:04:02 -0700 (PDT) Received: from newDragon ([195.239.208.174]) by smtp.googlemail.com with ESMTPSA id i7sm536619lfi.49.2019.05.13.08.04.00 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 May 2019 08:04:00 -0700 (PDT) Sender: =?UTF-8?B?0J3QuNC60L7Qu9Cw0Lkg0JjQttC40LrQvtCy?= From: Nikolay Izhikov X-Google-Original-From: Nikolay Izhikov Message-ID: <429e9594c7b26ad3feff794847180199f2fc242f.camel@gmail.com> Subject: Re: [IEP-35] Monitoring & Profiling. Proof of concept To: dev@ignite.apache.org Date: Mon, 13 May 2019 18:06:15 +0300 In-Reply-To: References: <50abfcdc9f59ae510bff1e325c77ab06036cc0d9.camel@gmail.com> Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="=-5ve6T1XAmFgzj5j3RDPg" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.1 Mime-Version: 1.0 --=-5ve6T1XAmFgzj5j3RDPg Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, Igniters. We have discussed this IEP [1] with Alexey Goncharyuk, Anton Vinogradov, An= drey Gura, Alexey Scherbakov and Pavel Kovalenko. Issues to address: 1. Study experience of following libs, tools: * OpenTracing * OpenSensus * DropWizard 2. Support histogram sensor: Sensor that collects values that gets into pre= defined segments=20 3. Use more widely used naming(like in OpenSensus?)=20 4. Consider the usage of OpenSensus as a default implementation for local m= etric storage. 5. To measure the performance penalty for metrics for 5_000 caches. 6. Some metrics should be part of public API and others are not(may be chan= ged/removed in release without warnings). My plan for Phase #1 is the following: 1. Address the issues. 2. Prepare public API 3. Prepare PR for monitoring subsystem + existing metrics rewritten with it= . 4. Prepare a PR with lists of each user API. 5. Collect feedback for a #4. 6. Design a log exposer. Consider the usage of JFR format or some other wid= ely used, tool compatible format. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=3D1128= 20392 =D0=92 =D0=A7=D1=82, 02/05/2019 =D0=B2 14:02 +0300, Nikolay Izhikov =D0=BF= =D0=B8=D1=88=D0=B5=D1=82: > Hello, Maxim. >=20 > > How will be recorded throughput sensor values which will require an int= erval for the rate calculations? >=20 > I answered to this question in IEP "Design principles": >=20 > ``` > Sensors should contain only raw values. No aggregation of numeric metrics= on Ignite side.=20 > Min, max, avg and other functions are the matter of an external monitorin= g system. > ``` >=20 > Throughput is a function `(S(t2) - S(t1))/(t2-t1)` > where S(t) is the sensor value in some point of time t. >=20 > Seems, throughput calculation is a responsibility of an external system. >=20 > What do you think? >=20 > > It seems to me that we can add an additional parameter of `sensitivityL= evel` to provide for the user a flexible sensor control (e.g., INFO, WARN, = NOTICE, DEBUG). >=20 > For now, I think that all sensors and lists will be very(very!) lightweig= ht. > So, we should be able to disable/enable it's, for sure. >=20 > But, we should turn off and turn on the whole Ignite subsystem=20 > for the case we have strong performance limitations for a particular work= load. >=20 > So, we have two "level" of monitoring - INFO and DEBUG(for profiling: IEP= -35 - Phase 3). > For example, AFAIK we can't disable current SQL system views(Why should w= e?) >=20 > =D0=92 =D0=92=D1=82, 30/04/2019 =D0=B2 14:33 +0300, Maxim Muzafarov =D0= =BF=D0=B8=D1=88=D0=B5=D1=82: > > Hello Nikolay, > >=20 > > I've looked through your PRs changes. > >=20 > > > Sensors > >=20 > > How will be recorded throughput sensor values which will require an > > interval for the rate calculations? Do we have such an example? For > > instance, getAllocationRate() or getEvictionRate(). These metrics are > > out of the scope of current PoC and IEP as they are not related to the > > user metrics, but it is a good example of a particular metric type. > >=20 > > It seems to me that we can add an additional parameter of > > `sensitivityLevel` to provide for the user a flexible sensor control > > (e.g., INFO, WARN, NOTICE, DEBUG). > >=20 > > It also seems that for the sensors getValue() the completely > > functional java approach can be used. Am I right? > >=20 > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov wro= te: > > >=20 > > > Hello, Vyacheslav. > > >=20 > > > Thanks for the feedback! > > >=20 > > > > HttpExposer with Jetty's dependencies should be detached> from the = core module. > > >=20 > > > Agreed. module hierarchy is the essence of the next steps. > > > For now it just a proof of my ideas for Ignite monitoring we can disc= uss. > > >=20 > > > > I like your approach with 'wrapper' for monitored objects, like don= 't like using 'ServiceConfiguration' directly as a monitored object for ser= vices > > >=20 > > > Agreed in general. > > > Seems, choosing the right data to expose is the matter of separate di= scussion for each Ignite entities. > > > I've planned to file tickets for each entity so anyone interested can= share his vision in it. > > >=20 > > > > In my opinion, each sensor should have a timestamp. > > >=20 > > > I'm not sure that *every* sensor should have directly associated time= stamp. > > > Seems, we should support sensors without timestamp for a current moni= toring numbers at least. > > >=20 > > > > Also, it'd be great to have an ability to store a list of a fixed s= ize> of last N sensors > > >=20 > > > What use-cases do you know for such sensors? > > > We have plans to support fixed size lists to show "Last N SQL queries= " or similar data. > > > Essentially, a sensor is just a single value with the name and known = meaning. > > >=20 > > > > It'd be great if you provide a more extended test to show the work = of> the system. > > >=20 > > > Sorry, for that :) > > > When you run 'MonitoringSelfTest' you should open http://localhost:80= 80/ignite/monitoring to view exposed info. > > > I provide this info in gist - https://gist.github.com/nizhikov/aa1e62= 22e6a3456472b881b8deb0e24d > > >=20 > > > I will extend this test to print results to console in the next itera= tions - stay tuned :) > > >=20 > > > =D0=92 =D0=92=D1=81, 28/04/2019 =D0=B2 23:35 +0300, Vyacheslav Daradu= r =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > > > Hi, Nikolay, > > > >=20 > > > > I looked through PR and IEP, and I have some comments: > > > >=20 > > > > It would be better to implement it as a separate module, I can't sa= y > > > > if it is possible for the main part of monitoring or not, but I > > > > believe that HttpExposer with Jetty's dependencies should be detach= ed > > > > from the core module. > > > >=20 > > > > I like your approach with 'wrapper' for monitored objects, like > > > > 'ComputeTaskInfo' in PR, and don't like using 'ServiceConfiguration= ' > > > > directly as a monitored object for services. I believe we shouldn't > > > > mix approaches. It'd be better always use some kind of container wi= th > > > > monitored object's information to work with such data. > > > >=20 > > > > In my opinion, each sensor should have a timestamp. Usually monitor= ing > > > > systems aggregate data and build graphics according to sensors > > > > timestamp. > > > >=20 > > > > Also, it'd be great to have an ability to store a list of a fixed s= ize > > > > of last N sensors, not to miss them without pushing to an external > > > > monitoring system. > > > >=20 > > > > It'd be great if you provide a more extended test to show the work = of > > > > the system. Everybody who looks to PR needs to run the test and get > > > > the info manually to see the completeness of sensors, this might be > > > > simplified by proper test. > > > >=20 > > > > Thank you! > > > >=20 > > > >=20 > > > >=20 > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov wrote: > > > > >=20 > > > > > Hello, Igniters. > > > > >=20 > > > > > I've prepared Proof of Concept for IEP-35 [1] > > > > > PR can be found here - https://github.com/apache/ignite/pull/6510 > > > > >=20 > > > > > I've done following changes: > > > > >=20 > > > > > 1. `GridMonitoringManager` [2] - simple implementation o= f manager to store all monitoring info > > > > > 2. `HttpPullExposerSpi` [3] - pull exposer implementation= that can respond with JSON from http://localhost:8080/ignite/monitoring. J= SON content can be veiwed in gist [4] > > > > > 3. Compute task start and finish monitoring in "compute" = list [5] > > > > > 4. Service registration are monitored in "service" list -= [6] > > > > > 5. Current `IgniteSpiMBeanAdapter` rewritten using `GridM= onitoringManager` [7] > > > > >=20 > > > > > Design principles, monitoring subsystem details and new Ignite en= tities can be found in IEP [1]. > > > > >=20 > > > > > My next steps will be: > > > > >=20 > > > > > 1. Implementation of JMX exposer > > > > > 2. Registration of all "lists" and "sensor groups" as a S= QL System view. > > > > > 3. Add monitoring for all unmonitoring Ignite API. (descr= ibed in IEP). > > > > > 4. Rewrite existing jmx metrics using GridMonitoringManag= er. > > > > >=20 > > > > > Please, share you thoughts. > > > > >=20 > > > > > Part of JSON file: > > > > > ``` > > > > > "COMPUTE": { > > > > > "tasks": { > > > > > "name": "tasks", > > > > > "rows": [ > > > > > { > > > > > "id": "0798817a-eeec-4386-9af7-94edb39ffced", > > > > > "sessionId": "a1814f95a61-912451ff-ca7b-4764-a7fd-728= f6a900000", > > > > > "data": { > > > > > "taskClasName": "org.apache.ignite.monitoring.Monit= oringSelfTest$$Lambda$145/1500885480", > > > > > "startTime": 1556287337944, > > > > > "timeout": 9223372036854776000, > > > > > "execName": null > > > > > }, > > > > > "name": "anotherBroadcast" > > > > > } > > > > > ``` > > > > >=20 > > > > > [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pag= eId=3D112820392 > > > > > [2] https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf= 5e35b99303deb9accee153c50R34 > > > > > [3] https://github.com/apache/ignite/pull/6510/files#diff-32239c4= 5e0ae3b692af2eae7078e1436R47 > > > > > [4] https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0= e24d > > > > > [5] https://github.com/apache/ignite/pull/6510/files#diff-d651ed2= 9d07bd0c5ce291654a3254cc0R749 > > > > > [6] https://github.com/apache/ignite/pull/6510/files#diff-0b4e54f= bda2b0da1c10eff48416336f6R1606 > > > > > [7] https://github.com/apache/ignite/pull/6510/files#diff-4398bf1= 18150500e059069b3a1638ec7R61 > > > >=20 > > > >=20 > > > >=20 --=-5ve6T1XAmFgzj5j3RDPg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEOiTcLcdgyP2exB5ZbiaPbjg91GUFAlzZh+cACgkQbiaPbjg9 1GVT2gf9EdQcSJmK5y5wKrib/2GxxrkOuEmVshMrV+kXyK7pHUmFSdFuMUcUAjpd 596krxnqp9DWA8BbsfCUGU9P1lqJAyJEnDgcrXTa4QW3oXs4oqTRfTWld1wAddpE 3+NtMeypSfHCOpTk8N8X2aqz2HyZlQuAMLBL4gq0LhvJGtYE1AJG/RB3aW/132GX OM1NK/sxq7u/fyTJhjq5Lab6tfnN0d6ueGaT1HTY02Noc5uEYPGp52QYp3e92AiH xbA9E9OhKaN3bhdfcXLufhgEdYLezvlNurYzOqre8qOq5lSpbZq2aB25byy+gehb khAiiejIovtomSeKJRu9Hdfp0Lzijg== =QYN/ -----END PGP SIGNATURE----- --=-5ve6T1XAmFgzj5j3RDPg--