ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Ignite diagnostic (SQL system views)
Date Tue, 13 Feb 2018 01:10:07 GMT
Alex P, sounds like a good plan for me.

Vladimir, do you have any suggestions or corrections?

—
Denis

> On Feb 12, 2018, at 4:57 AM, Alex Plehanov <plehanov.alex@gmail.com> wrote:
> 
> The views engine and the first view are almost ready to merge (review
> comments are resolved). Which views should we take next? My proposal -
> NODES, NODE_ATTRIBUTES, NODE_METRICS, NODE_HOSTS and NODE_ADDRESSES, since
> these views are clear and all topology data available on each node.
> Any objections?
> 
> 2018-01-25 16:27 GMT+03:00 Alex Plehanov <plehanov.alex@gmail.com>:
> 
>> Anton, Vladimir, I've made some fixes. There is only one view left and
>> it's renamed to 'IGNITE.LOCAL_TRANSACTIONS'.
>> 
>> High level design of solution:
>> When IgniteH2Indexing is starting, it create and start
>> new GridH2SysViewProcessor, which create and register in H2 (via its own
>> table engine) all implementations of system views. Each system view
>> implementation extends base abstract class GridH2SysView. View
>> implementation describes columns, their types and indexes in constructor
>> and must override method getRows for data retrieval (this method called by
>> H2-compatible table and index implementations for ignite system views).
>> Almost no fixes to existing parsing engine was made, except some places,
>> where GridH2Table instance was expected, but for system views there is
>> another class.
>> 
>> New PR: [1].  Please have a look.
>> 
>> [1] https://github.com/apache/ignite/pull/3433
>> 
>> 2018-01-24 19:12 GMT+03:00 Anton Vinogradov <avinogradov@gridgain.com>:
>> 
>>> I've created IEP-13 [1] to cover all cases.
>>> Feel free to create issues.
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75962769
>>> 
>>> On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <vozerov@gridgain.com>
>>> wrote:
>>> 
>>>> Let's start with a single and the most simple view, e.g.
>>>> LOCAL_TRANSACTIONS. We will review and merge it along with necessary
>>>> infrastructure. Then will handle the rest view in separate tickets and
>>>> separate focused discussions.
>>>> 
>>>> On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <plehanov.alex@gmail.com
>>>> 
>>>> wrote:
>>>> 
>>>>> 1) It’s not a principal point, I can change schema. The
>>>> INFORMATION_SCHEMA
>>>>> was used because it’s already exists and usually used for metadata
>>> tables
>>>>> and views. Your proposal is to use schema “IGNITE”, am I understand
>>> you
>>>>> right? BTW, for now, we can’t query another (H2) meta tables from the
>>>>> INFORMATION_SCHEMA, so, “Ignite system views” is only available views
>>> to
>>>>> query from this schema.
>>>>> 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
>>>> determine
>>>>> which node we are connected to.
>>>>> 3) As the first phase, in my opinion, local views will be enough.
>>>>> Performance and caching of distributed views should be discussed at
>>> next
>>>>> phases, when distributed views implementation will be planned. In
>>> current
>>>>> implementation I tried to use indexing for local views wherever it’s
>>>>> possible.
>>>>> 4) I don’t think, that JVM info is more critical information than,
for
>>>>> example, caches or nodes information. When authorization capabilities
>>>>> planned to implement?
>>>>> 
>>>>> About local data: yes, we can rename all currently implemented views
>>> for
>>>>> the local node data as LOCAL_..., and create (someday) new whole
>>> cluster
>>>>> views (which use distributed requests) without prefix or, for example,
>>>> with
>>>>> CLUSTER_ prefix. But some views can show all cluster information using
>>>> only
>>>>> local node data, without distributed requests (for example
>>>>> IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT, IGNITE_PART_ALLOCATION,
>>>>> IGNITE_NODES, etc). Are they local or cluster views in this concept?
>>>> Which
>>>>> prefix should be used? And what about caches? Are they local or
>>> cluster?
>>>> On
>>>>> local node we can see cluster wide caches (replicated and distributed)
>>>> and
>>>>> caches for current node only. Local caches list may differ from node
>>> to
>>>>> node. Which prefix should be used for this view? And one more, there
>>> is
>>>> no
>>>>> sense for some views to make them cluster wide (for example
>>>>> INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without creating
>>>>> INSTANCE view?
>>>>> 
>>>>> So, next steps: split PR, change schema name (IGNITE?), change view
>>> name
>>>>> for caches (CACHES, LOCAL_CACHES?)
>>>>> 
>>>>> 
>>>>> 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>>>>> 
>>>>>> Hi Alex,
>>>>>> 
>>>>>> System views could be extremely valuable addition for Ignite.
>>> Ideally,
>>>>> user
>>>>>> should be able to monitor and manage state of the whole cluster
>>> with a
>>>>>> single SQL command line. We have plans to implement it for a very
>>> long
>>>>>> time. However, this is very sensitive task which should take a lot
>>> of
>>>>>> moving pieces in count, such as usability, consistency, performance,
>>>>>> security, etc..
>>>>>> 
>>>>>> Let me point several major concerns I see at the moment:
>>>>>> 
>>>>>> 1) Usability: INFORMATION_SCHEMA
>>>>>> This schema is part of SQL ANSI standard. When creating system
>>> views,
>>>>> some
>>>>>> vendors prefer to store them in completely different predefined
>>> schema
>>>>>> (Oracle, MS SQL). Others prefer to keep them in INFORMATION_SCHEMA
>>>>>> directly. Both approaches could work. However, the latter breaks
>>>>> separation
>>>>>> of concerns - we store typical metadata near to possibly sensitive
>>>> system
>>>>>> data. Also it makes security management more complex - system data
>>> is
>>>>> very
>>>>>> sensitive, and now we cannot simply grant access
>>> INFORMATIONAL_SCHEMA
>>>> to
>>>>>> user. Instead, we have to grant that access on per-view basis. For
>>> this
>>>>>> reason my preference is to store system tables in separate schema,
>>> not
>>>> in
>>>>>> INFORMATION_SCHEMA
>>>>>> 
>>>>>> 2) Consistency: local data
>>>>>> One of implemented view GridH2SysViewImplInstance. Normally SQL
>>> users
>>>>>> communicate with Ignite through JDBC/ODBC drivers. These drivers
are
>>>>>> connected to a single node, typically client node. Moreover, we will
>>>>>> introduce high-availability feature when drivers were able to
>>> connect
>>>> to
>>>>>> any address from a predefined list. It renders this view useless,
as
>>>> you
>>>>> do
>>>>>> not know which node you connected to. Also, local-only data cannot
>>> be
>>>>>> joined in general case - you will receive different results on
>>>> different
>>>>>> nodes. The same goes for transactions, JVM info, etc.
>>>>>> 
>>>>>> 3) Performance
>>>>>> Suppose we fixed consistency of transactions and now this view shows
>>>>>> transactions in the whole cluster with possibility to filter them
by
>>>>> nodes
>>>>>> - this is what user would expect out of the box. Another problem
>>>> appears
>>>>>> then - performance. How would we collect necessary data? How would
>>> we
>>>>>> handle joins, when particular view could be scanned multiple times
>>>> during
>>>>>> query execution? How we achieve sensible consistency? Most probably
>>> we
>>>>>> would collect remote data once when query is started, cache it
>>> somehow
>>>> on
>>>>>> query session level, and then re-use during joins. But again, this
>>>> should
>>>>>> be discussed separately.
>>>>>> 
>>>>>> 4) Security: JVM info
>>>>>> We should define clear boundaries of what info is exposed. JVM data
>>>> along
>>>>>> with running threads is critically sensitive information. We should
>>> not
>>>>>> expose it until we have authorization capabilities.
>>>>>> 
>>>>>> In order to start moving this code from prototype to production
>>> state
>>>> we
>>>>>> should start with the most simple and consistent views. E.g.
>>>>> IGNITE_CACHES.
>>>>>> Let's move it to a separate PR, review infrastructure code, review
>>> view
>>>>>> implementation, agree on proper naming and placement, and merge it.
>>>> Then
>>>>>> each and every view (or group of related views) should be discussed
>>> and
>>>>>> reviewed separately.
>>>>>> 
>>>>>> As far as node-local stuff, may be we should move it to a separate
>>>>> schema,
>>>>>> or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
>>>>> transactions
>>>>>> in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on the
>>> local
>>>>>> node. In this case we will be able to merge "local" stuff shortly,
>>> and
>>>>>> implement more complex but at the same time much more useful
>>>> distributed
>>>>>> stuff later on.
>>>>>> 
>>>>>> Makes sense?
>>>>>> 
>>>>>> Vladimir.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
>>>> plehanov.alex@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hello, Igniters!
>>>>>>> 
>>>>>>> For Ignite diagnostic usually it’s helpful to get some Ignite
>>>> internals
>>>>>>> information. But currently, in my opinion, there are no convenient
>>>>> tools
>>>>>>> for this purpose:
>>>>>>> 
>>>>>>> ·        Some issues can be solved by analyzing log files. Log
>>> files
>>>>> are
>>>>>>> useful for dumps, but sometimes they are difficult to read. Also
>>>>>>> interesting metrics can’t be received runtime by request, we
need
>>> to
>>>>> wait
>>>>>>> until Ignite will write these metrics by timeout or other events.
>>>>>>> 
>>>>>>> ·        JMX is useful for scalar metrics. Complex and table
data
>>> can
>>>>>> also
>>>>>>> be received, but it’s difficult to read, filter and sort them
>>> without
>>>>>>> processing by specialized external tools. For most frequently
used
>>>>> cases
>>>>>>> almost duplicating metrics are created to show data in an
>>>> easy-to-read
>>>>>>> form.
>>>>>>> 
>>>>>>> ·        Web-console is able to show table and complex data.
>>> Perhaps,
>>>>>>> someday  web-console will contain all necessary dashboards for
>>> most
>>>>>> problem
>>>>>>> investigation, but some non-trivial queries will not be covered
>>>> anyway.
>>>>>>> Also web-console needs additional infrastructure to work.
>>>>>>> 
>>>>>>> ·        External “home-made” tools can be used for non-trivial
>>>> cases.
>>>>>> They
>>>>>>> cover highly specialized cases and usually can’t be used as
>>> general
>>>>>> purpose
>>>>>>> tools.
>>>>>>> 
>>>>>>> Sometimes we are forced to use more than one tool and join data
by
>>>>> hands
>>>>>>> (for example, current thread dump and data from logs).
>>>>>>> 
>>>>>>> Often RDBMS for diagnostic purposes provides system views (for
>>>> example,
>>>>>>> DBA_% and V$% in Oracle), which can be queried by SQL. This
>>> solution
>>>>>> makes
>>>>>>> all internal diagnostic information available in a readable form
>>>> (with
>>>>>> all
>>>>>>> possible filters and projections) without using any other
>>> internal or
>>>>>>> external tools. My proposal is to create similar system views
in
>>>>> Ignite.
>>>>>>> 
>>>>>>> I implement working prototype (PR: [1]). It contains views:
>>>>>>> 
>>>>>>> IGNITE_SYSTEM_VIEWS
>>>>>>> 
>>>>>>> Registered system views
>>>>>>> 
>>>>>>> IGNITE_INSTANCE
>>>>>>> 
>>>>>>> Ignite instance
>>>>>>> 
>>>>>>> IGNITE_JVM_THREADS
>>>>>>> 
>>>>>>> JVM threads
>>>>>>> 
>>>>>>> IGNITE_JVM_RUNTIME
>>>>>>> 
>>>>>>> JVM runtime
>>>>>>> 
>>>>>>> IGNITE_JVM_OS
>>>>>>> 
>>>>>>> JVM operating system
>>>>>>> 
>>>>>>> IGNITE_CACHES
>>>>>>> 
>>>>>>> Ignite caches
>>>>>>> 
>>>>>>> IGNITE_CACHE_CLUSTER_METRICS
>>>>>>> 
>>>>>>> Ignite cache cluster metrics
>>>>>>> 
>>>>>>> IGNITE_CACHE_NODE_METRICS
>>>>>>> 
>>>>>>> Ignite cache node metrics
>>>>>>> 
>>>>>>> IGNITE_CACHE_GROUPS
>>>>>>> 
>>>>>>> Cache groups
>>>>>>> 
>>>>>>> IGNITE_NODES
>>>>>>> 
>>>>>>> Nodes in topology
>>>>>>> 
>>>>>>> IGNITE_NODE_HOSTS
>>>>>>> 
>>>>>>> Node hosts
>>>>>>> 
>>>>>>> IGNITE_NODE_ADDRESSES
>>>>>>> 
>>>>>>> Node addresses
>>>>>>> 
>>>>>>> IGNITE_NODE_ATTRIBUTES
>>>>>>> 
>>>>>>> Node attributes
>>>>>>> 
>>>>>>> IGNITE_NODE_METRICS
>>>>>>> 
>>>>>>> Node metrics
>>>>>>> 
>>>>>>> IGNITE_TRANSACTIONS
>>>>>>> 
>>>>>>> Active transactions
>>>>>>> 
>>>>>>> IGNITE_TRANSACTION_ENTRIES
>>>>>>> 
>>>>>>> Cache entries used by transaction
>>>>>>> 
>>>>>>> IGNITE_TASKS
>>>>>>> 
>>>>>>> Active tasks
>>>>>>> 
>>>>>>> IGNITE_PART_ASSIGNMENT
>>>>>>> 
>>>>>>> Partition assignment map
>>>>>>> 
>>>>>>> IGNITE_PART_ALLOCATION
>>>>>>> 
>>>>>>> Partition allocation map
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> There are much more useful views can be implemented (executors
>>>>>> diagnostic,
>>>>>>> SPIs diagnostic, etc).
>>>>>>> 
>>>>>>> Some usage examples:
>>>>>>> 
>>>>>>> Cache groups and their partitions, which used by transaction
more
>>>> than
>>>>> 5
>>>>>>> minutes long:
>>>>>>> 
>>>>>>> SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
>>>>> ENTITIES_CNT
>>>>>>> FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
>>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID
=
>>>>> te.XID
>>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME = c.NAME
>>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID
=
>>> cg.ID
>>>>>>> WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
>>>>>>> GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Average CPU load on server nodes grouped by operating system:
>>>>>>> 
>>>>>>> SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
>>>>>>> FROM INFORMATION_SCHEMA.IGNITE_NODES n
>>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID
=
>>>> n.ID
>>>>>> AND
>>>>>>> na.NAME = 'os.name'
>>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID
=
>>> n.ID
>>>>>>> WHERE n.IS_CLIENT = false
>>>>>>> GROUP BY na.VALUE
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Top 5 nodes by puts to cache ‘cache’:
>>>>>>> 
>>>>>>> SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
>>>>>>> INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
>>>>>>> WHERE cm.CACHE_NAME = 'cache'
>>>>>>> ORDER BY cm.CACHE_PUTS DESC
>>>>>>> LIMIT 5
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Does this implementation interesting to someone else? Maybe any
>>> views
>>>>> are
>>>>>>> redundant? Which additional first-priority views must be
>>> implemented?
>>>>> Any
>>>>>>> other thoughts or proposal?
>>>>>>> 
>>>>>>> [1] https://github.com/apache/ignite/pull/3413
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message