Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D98F8200C2B for ; Thu, 2 Mar 2017 23:00:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D820F160B6F; Thu, 2 Mar 2017 22:00:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 05186160B6A for ; Thu, 2 Mar 2017 23:00:47 +0100 (CET) Received: (qmail 65599 invoked by uid 500); 2 Mar 2017 22:00:47 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 65588 invoked by uid 99); 2 Mar 2017 22:00:47 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Mar 2017 22:00:47 +0000 Received: from [192.168.75.113] (c-73-222-138-29.hsd1.ca.comcast.net [73.222.138.29]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id E06461A02FC for ; Thu, 2 Mar 2017 22:00:46 +0000 (UTC) From: Denis Magda Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Additional SQL metrics Date: Thu, 2 Mar 2017 14:00:46 -0800 References: <0309ABC9-F8A4-4B77-AED5-A7770D712D78@apache.org> <941C2CB5-193F-4864-8268-6BE9C8B19EE9@apache.org> To: dev@ignite.apache.org In-Reply-To: Message-Id: <5539B2E2-1CEF-4092-AD7B-D5ECFDDD60AF@apache.org> X-Mailer: Apple Mail (2.3259) archived-at: Thu, 02 Mar 2017 22:00:49 -0000 Vovan, When I=E2=80=99m speaking of JOIN metrics I=E2=80=99m simply assume that = we need to add metrics relevant for queries with joins, metrics that = will help us get more insights on non-collocated and collocated joins = execution flow. > 1) Query exec count > 2) Query exec time (first define what "time" means) - min, max, avg Total query execution time might not be helpful in the trickiest cases. = What if you have multiple joins in your query? How do I know which one = contributes to the execution most? So, I would do a breakdown having total time, map time, per-join time, = reduce time. Hope it=E2=80=99s possible. If it=E2=80=99s unclear how to = support everything at the first place then it=E2=80=99s a different = question. Let=E2=80=99s create several tickets and start implementing = everything gracefully. > 3) Number of bytes exchanged between nodes during query execution It will be really helpful to make a breakdown showing a number of bytes = exchanged per-join (physical join). Again, if you believe it makes sense = to do the breakdown later then let=E2=80=99s create an additional ticket = for the 2 tier metrics then.=20 > 4) Number of returned rows - min, max, avg Plus, let=E2=80=99s add the following to the list: 5) Collocated: yes or no =E2=80=94 Denis > On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov = wrote: >=20 > Denis, >=20 > The main problem with suggested metrics is that they implies that = ceratin > internal mechanics work in predefined way. For example, what is JOIN > metrics? There are no guarantees that JOIN in user's query will be > translated to a real physical join. What if several different query > execution pieces happen in parallel? What if we rework our distributed > query engine from pull to push approach for performance reasons and = there > will be no JOINs in classical sense? >=20 > This is why I think that we should start with very basic things. = Something > like: > 1) Query exec count > 2) Query exec time (first define what "time" means) - min, max, avg > 3) Number of bytes exchanged between nodes during query execution > 4) Number of returned rows - min, max, avg >=20 > Once we have base numbers in place, we can think of carefully = integrating > and enhancing all pieces of query execution into more verbose formats, > similar to query plans with relative weights in classical RDBMS = systems. >=20 > Thoughts? >=20 >=20 >=20 > On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda = wrote: >=20 >> Vovan, >>=20 >> Your metrics make perfect sense to me. However, I see a high demand = for >> JOINs based metrics especially from those who give a try to = non-collocated >> joins in production and want to measure them somehow. This is why, >> personally, I prefer to see the metrics below in the top priority = list as >> well: >>=20 >> if a query was executed in the collocated or non-collocated mode. = Three >> results are valid: collocated, non-collocated, simple query (no = joins). >> non-collocated query: size of the data exchanged between the nodes to >> complete a specific join. If there are multiple joins in the query we = need >> to provide this metric for every of them. >> non-collocated and collocated query: a part of the time spent joining = the >> data. If there are multiple joins in the query we need to provide = this >> metric for every of them. >>=20 >> As for =E2=80=9Cunicast=E2=80=9D and =E2=80=9Cbroadcast=E2=80=9D, = agree, let=E2=80=99s ignore it for now. >>=20 >> In any case, can we include timing information (map phase, reduce = phase, >> join phase) into an execution plan produced by H2? Are there any >> implementation hooks? >>=20 >> =E2=80=94 >> Denis >>=20 >>=20 >>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan = >> wrote: >>>=20 >>> I think some of the metrics specified by Denis also make sense, so I >> would >>> add them as well. See below... >>>=20 >>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov = > > >>> wrote: >>>=20 >>>> Denis, >>>>=20 >>>> Query execution is complex process involving different stages which = are >> not >>>> very easy to match with each other. Especially provided that any = node >> can >>>> leave topology at any time. Another problem is that engine evolves = and >>>> metrics like "did a query do broadcast or unicast" may easily = become >>>> useless at some point, because for example there will be neither >> unicast, >>>> nor broadast, but something different. On the other hand I = completely >> agree >>>> that performance monitoring is essential part of any mature DBMS. >>>>=20 >>>> I would start with metrics which are both very basic and easy to >> implement >>>> at the same time. For example we can add fingerprint (hash) to = every >> query >>>> which will be used to join "map" and "reduce" parts with each other = and >> add >>>> the following basic metrics: >>>> 1) Execution count for particular query >>>> 2) Number of map nodes - min, max, avg >>>>=20 >>>=20 >>> (1) and (2) makes sense >>>=20 >>>=20 >>>> 3) Map step duration (if applicable) - min, max, >>>=20 >>> 4) Reduce step duration (if applicable) - min, max, avg >>>>=20 >>>=20 >>> Not sure if (3) and (4) are needed. I would only add them if they = are >> easy >>> to implement. >>>=20 >>> I would also add these: >>>=20 >>> 5) Collocated: yes/no >>> 6) last execution time >>> 7) min/max/average execution duration >>>=20 >>>=20 >>>>=20 >>>> Once done users will be able to get statistics for particular = queries. >>>>=20 >>>> Vladimir. >>>>=20 >>>>=20 >>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda = wrote: >>>>=20 >>>>> BTW, >>>>>=20 >>>>> What if we expose per-query metrics below as a part of EXPLAIN = ANALYZE? >>>>> Sergi, is this feasible? >>>>>=20 >>>>> =E2=80=94 >>>>> Denis >>>>>=20 >>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda = wrote: >>>>>>=20 >>>>>> Igniters, >>>>>>=20 >>>>>> Let=E2=80=99s shed more light on SQL query execution internals = introducing a >>>> set >>>>> of useful metrics = (https://issues.apache.org/jira/browse/IGNITE-4757). >>>>>>=20 >>>>>> Per-query metrics. Total history size is defined by >>>> *CacheConfiguration. >>>>> getQueryDetailMetricsSize*: >>>>>> * if a query was executed in the collocated or non-collocated = mode. >>>>> Three results are valid: collocated, non-collocated, simple query = (no >>>>> joins). >>>>>> * non-collocated query: size of the data exchanged between the = nodes >> to >>>>> complete a join. >>>>>> * non-collocated query: did a query do broadcast or unicast to = get >> data >>>>> needed to complete a join. >>>>>> * non-collocated and collocated query: a part of the time spent >> joining >>>>> the data. >>>>>>=20 >>>>>> CacheMetrics: >>>>>> * an average number of executed SQL queries (collocated, >>>> non-collocated, >>>>> simple query (no joins)). >>>>>>=20 >>>>>> Please don=E2=80=99t hesitate do share suggest another metrics or = improve >>>>> proposed ones. >>>>>>=20 >>>>>> =E2=80=94 >>>>>> Denis >>=20 >>=20