From dev-return-30346-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Wed Jan 24 15:29:56 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 00A7E180630 for ; Wed, 24 Jan 2018 15:29:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E3CB7160C3C; Wed, 24 Jan 2018 14:29:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DB9EF160C2E for ; Wed, 24 Jan 2018 15:29:54 +0100 (CET) Received: (qmail 23214 invoked by uid 500); 24 Jan 2018 14:29:49 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 23202 invoked by uid 99); 24 Jan 2018 14:29:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Jan 2018 14:29:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E043BC0DF4 for ; Wed, 24 Jan 2018 14:29:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id L0bR3Y1AIfTp for ; Wed, 24 Jan 2018 14:29:45 +0000 (UTC) Received: from mail-lf0-f50.google.com (mail-lf0-f50.google.com [209.85.215.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 13D235F2C3 for ; Wed, 24 Jan 2018 14:29:45 +0000 (UTC) Received: by mail-lf0-f50.google.com with SMTP id q194so5465123lfe.13 for ; Wed, 24 Jan 2018 06:29:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=lCoDIZQ8x0p6siX8P3fyAaC9VGo25DaPxtQWUPW87NU=; b=CA6HYOBetIIFqT3R0upF2hvZuFsLMyv4drzW8Efie/pV+oLfhlHfo524gNPSMSbM3p MBVNWNUC1wqiHxFJW+8dSkPMGGP+WOmclC2mK30b3uzIUSrC3WN70zTzUZ870rvIfJq2 X3awSUAoZdaVfoZ5MrvMZkolsPy0AeYbEewt4VitgqrT3TXRJR3y0+JSNcheoJtyMYKH YcSLKy2nNMk2OL3Cb/CwpOjyFZr9ofsOcgNucisakf5utYH+3d5A+tSH4tcHxo4y/Skh XyoSH6Jju2lZGzTQEPkS1Va4/SACAhbOpfAeE3HfK2FUkpiGAlzc2WMCaaPK6N0SIHFL qc8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=lCoDIZQ8x0p6siX8P3fyAaC9VGo25DaPxtQWUPW87NU=; b=cuhx5SmUdwSDQdInc5T08o4vbVO5hMSUJjeUEz0vxQ+BfEjSoljXlKTy3m/r06Ogjw gyIzujiFkfQwwGP6SGhT+X//lwe6uPs7TjagCg7S66Q45smbPee/B7YERjjbgI3dQ52L brJIhn6MiOcCgonwhs3Ox+4wCQW8uWFE8XWH+GtdeaLRWfX9i236+hpPaP4Y/iRmpK8O EDnaViFTeilu+xPoTpYYZDVgwSBJZlY3RemQwWsOAQW+Mo5URzMTord+C94WcL3Lsv91 +BwLOg8WlTmx9PuFvkwLVql8E+SSg3zvlN6qWZhn2DDDVE7CI45pLwKEILQdH8SC93tG oIAg== X-Gm-Message-State: AKwxytdlxc8DXhWB1cYAmHP+m1gx5SV/tgf5H6lxvBB6Uupvl+byb3VV xDg6eA1WCPLdUEn9K5TkdQz624JolSXs9Wa0pF/qFQ== X-Google-Smtp-Source: AH8x227SKZrQ47gjH6mnAGCaKgxFkAZ2YwCPgpaxPHtMuuRcNoWvZml8YJpF42CAAK2ONPRqG69HfmG+47CM2cw+4r8= X-Received: by 10.46.29.17 with SMTP id d17mr3584024ljd.137.1516804183516; Wed, 24 Jan 2018 06:29:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.46.33.144 with HTTP; Wed, 24 Jan 2018 06:29:43 -0800 (PST) In-Reply-To: References: From: Alex Plehanov Date: Wed, 24 Jan 2018 17:29:43 +0300 Message-ID: Subject: Re: Ignite diagnostic (SQL system views) To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a114a488e8d6c340563867ee7" --001a114a488e8d6c340563867ee7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 1) It=E2=80=99s not a principal point, I can change schema. The INFORMATION= _SCHEMA was used because it=E2=80=99s already exists and usually used for metadata = tables and views. Your proposal is to use schema =E2=80=9CIGNITE=E2=80=9D, am I un= derstand you right? BTW, for now, we can=E2=80=99t query another (H2) meta tables from t= he INFORMATION_SCHEMA, so, =E2=80=9CIgnite system views=E2=80=9D is only avail= able views to query from this schema. 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to determine which node we are connected to. 3) As the first phase, in my opinion, local views will be enough. Performance and caching of distributed views should be discussed at next phases, when distributed views implementation will be planned. In current implementation I tried to use indexing for local views wherever it=E2=80=99= s possible. 4) I don=E2=80=99t think, that JVM info is more critical information than, = for example, caches or nodes information. When authorization capabilities planned to implement? About local data: yes, we can rename all currently implemented views for the local node data as LOCAL_..., and create (someday) new whole cluster views (which use distributed requests) without prefix or, for example, with CLUSTER_ prefix. But some views can show all cluster information using only local node data, without distributed requests (for example IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION, IGNITE_NODES, etc). Are they local or cluster views in this concept? Which prefix should be used? And what about caches? Are they local or cluster? On local node we can see cluster wide caches (replicated and distributed) and caches for current node only. Local caches list may differ from node to node. Which prefix should be used for this view? And one more, there is no sense for some views to make them cluster wide (for example INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating INSTANCE view? So, next steps: split PR, change schema name (IGNITE?), change view name for caches (CACHES, LOCAL_CACHES?) 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov : > Hi Alex, > > System views could be extremely valuable addition for Ignite. Ideally, us= er > should be able to monitor and manage state of the whole cluster with a > single SQL command line. We have plans to implement it for a very long > time. However, this is very sensitive task which should take a lot of > moving pieces in count, such as usability, consistency, performance, > security, etc.. > > Let me point several major concerns I see at the moment: > > 1) Usability: INFORMATION_SCHEMA > This schema is part of SQL ANSI standard. When creating system views, som= e > vendors prefer to store them in completely different predefined schema > (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA > directly. Both approaches could work. However, the latter breaks separati= on > of concerns - we store typical metadata near to possibly sensitive system > data. Also it makes security management more complex - system data is ver= y > sensitive, and now we cannot simply grant access INFORMATIONAL_SCHEMA to > user. Instead, we have to grant that access on per-view basis. For this > reason my preference is to store system tables in separate schema, not in > INFORMATION_SCHEMA > > 2) Consistency: local data > One of implemented view GridH2SysViewImplInstance. Normally SQL users > communicate with Ignite through JDBC/ODBC drivers. These drivers are > connected to a single node, typically client node. Moreover, we will > introduce high-availability feature when drivers were able to connect to > any address from a predefined list. It renders this view useless, as you = do > not know which node you connected to. Also, local-only data cannot be > joined in general case - you will receive different results on different > nodes. The same goes for transactions, JVM info, etc. > > 3) Performance > Suppose we fixed consistency of transactions and now this view shows > transactions in the whole cluster with possibility to filter them by node= s > - this is what user would expect out of the box. Another problem appears > then - performance. How would we collect necessary data? How would we > handle joins, when particular view could be scanned multiple times during > query execution? How we achieve sensible consistency? Most probably we > would collect remote data once when query is started, cache it somehow on > query session level, and then re-use during joins. But again, this should > be discussed separately. > > 4) Security: JVM info > We should define clear boundaries of what info is exposed. JVM data along > with running threads is critically sensitive information. We should not > expose it until we have authorization capabilities. > > In order to start moving this code from prototype to production state we > should start with the most simple and consistent views. E.g. IGNITE_CACHE= S. > Let's move it to a separate PR, review infrastructure code, review view > implementation, agree on proper naming and placement, and merge it. Then > each and every view (or group of related views) should be discussed and > reviewed separately. > > As far as node-local stuff, may be we should move it to a separate schema= , > or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all transaction= s > in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the local > node. In this case we will be able to merge "local" stuff shortly, and > implement more complex but at the same time much more useful distributed > stuff later on. > > Makes sense? > > Vladimir. > > > On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov > wrote: > > > Hello, Igniters! > > > > For Ignite diagnostic usually it=E2=80=99s helpful to get some Ignite i= nternals > > information. But currently, in my opinion, there are no convenient tool= s > > for this purpose: > > > > =C2=B7 Some issues can be solved by analyzing log files. Log fil= es are > > useful for dumps, but sometimes they are difficult to read. Also > > interesting metrics can=E2=80=99t be received runtime by request, we ne= ed to wait > > until Ignite will write these metrics by timeout or other events. > > > > =C2=B7 JMX is useful for scalar metrics. Complex and table data = can > also > > be received, but it=E2=80=99s difficult to read, filter and sort them w= ithout > > processing by specialized external tools. For most frequently used case= s > > almost duplicating metrics are created to show data in an easy-to-read > > form. > > > > =C2=B7 Web-console is able to show table and complex data. Perha= ps, > > someday web-console will contain all necessary dashboards for most > problem > > investigation, but some non-trivial queries will not be covered anyway. > > Also web-console needs additional infrastructure to work. > > > > =C2=B7 External =E2=80=9Chome-made=E2=80=9D tools can be used fo= r non-trivial cases. > They > > cover highly specialized cases and usually can=E2=80=99t be used as gen= eral > purpose > > tools. > > > > Sometimes we are forced to use more than one tool and join data by hand= s > > (for example, current thread dump and data from logs). > > > > Often RDBMS for diagnostic purposes provides system views (for example, > > DBA_% and V$% in Oracle), which can be queried by SQL. This solution > makes > > all internal diagnostic information available in a readable form (with > all > > possible filters and projections) without using any other internal or > > external tools. My proposal is to create similar system views in Ignite= . > > > > I implement working prototype (PR: [1]). It contains views: > > > > IGNITE_SYSTEM_VIEWS > > > > Registered system views > > > > IGNITE_INSTANCE > > > > Ignite instance > > > > IGNITE_JVM_THREADS > > > > JVM threads > > > > IGNITE_JVM_RUNTIME > > > > JVM runtime > > > > IGNITE_JVM_OS > > > > JVM operating system > > > > IGNITE_CACHES > > > > Ignite caches > > > > IGNITE_CACHE_CLUSTER_METRICS > > > > Ignite cache cluster metrics > > > > IGNITE_CACHE_NODE_METRICS > > > > Ignite cache node metrics > > > > IGNITE_CACHE_GROUPS > > > > Cache groups > > > > IGNITE_NODES > > > > Nodes in topology > > > > IGNITE_NODE_HOSTS > > > > Node hosts > > > > IGNITE_NODE_ADDRESSES > > > > Node addresses > > > > IGNITE_NODE_ATTRIBUTES > > > > Node attributes > > > > IGNITE_NODE_METRICS > > > > Node metrics > > > > IGNITE_TRANSACTIONS > > > > Active transactions > > > > IGNITE_TRANSACTION_ENTRIES > > > > Cache entries used by transaction > > > > IGNITE_TASKS > > > > Active tasks > > > > IGNITE_PART_ASSIGNMENT > > > > Partition assignment map > > > > IGNITE_PART_ALLOCATION > > > > Partition allocation map > > > > > > > > There are much more useful views can be implemented (executors > diagnostic, > > SPIs diagnostic, etc). > > > > Some usage examples: > > > > Cache groups and their partitions, which used by transaction more than = 5 > > minutes long: > > > > SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS ENTITIES_C= NT > > FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t > > JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID =3D te.X= ID > > JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME =3D c.NAME > > JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID =3D cg.ID > > WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW()) > > GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION > > > > > > > > Average CPU load on server nodes grouped by operating system: > > > > SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD > > FROM INFORMATION_SCHEMA.IGNITE_NODES n > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID =3D n.I= D > AND > > na.NAME =3D 'os.name' > > JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID =3D n.ID > > WHERE n.IS_CLIENT =3D false > > GROUP BY na.VALUE > > > > > > > > Top 5 nodes by puts to cache =E2=80=98cache=E2=80=99: > > > > SELECT cm.NODE_ID, cm.CACHE_PUTS FROM > > INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm > > WHERE cm.CACHE_NAME =3D 'cache' > > ORDER BY cm.CACHE_PUTS DESC > > LIMIT 5 > > > > > > > > Does this implementation interesting to someone else? Maybe any views a= re > > redundant? Which additional first-priority views must be implemented? A= ny > > other thoughts or proposal? > > > > [1] https://github.com/apache/ignite/pull/3413 > > > --001a114a488e8d6c340563867ee7--