Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@ignite.apache.org
From: Denis Magda <dmagda@apache.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\))
Subject: Re: Additional SQL metrics
Date: Thu, 2 Mar 2017 14:00:46 -0800
References: <F0D85A28-3F93-43EC-809D-1E6F07BCDF8C@apache.org>
 <0309ABC9-F8A4-4B77-AED5-A7770D712D78@apache.org>
 <CAGt6o8=59iJEYZknUOiSgF5GeAH85h3fm0WU=zvmtKKrhYR6TA@mail.gmail.com>
 <CA+0=VoUgnTsAfiSoZO3E7uN8G2okRkWMEQSBfUiDfX3QOHHjSQ@mail.gmail.com>
 <941C2CB5-193F-4864-8268-6BE9C8B19EE9@apache.org>
 <CAGt6o8=-nKymi4_Zv9k0vza9Rig5RbuVhJD5etTW+Y=Ey4g-hA@mail.gmail.com>
To: dev@ignite.apache.org
In-Reply-To: <CAGt6o8=-nKymi4_Zv9k0vza9Rig5RbuVhJD5etTW+Y=Ey4g-hA@mail.gmail.com>
Message-Id: <5539B2E2-1CEF-4092-AD7B-D5ECFDDD60AF@apache.org>
archived-at: Thu, 02 Mar 2017 22:00:49 -0000

Vovan,

When I=E2=80=99m speaking of JOIN metrics I=E2=80=99m simply assume that =
we need to add metrics relevant for queries with joins, metrics that =
will help us get more insights on non-collocated and collocated joins =
execution flow.

> 1) Query exec count
> 2) Query exec time (first define what "time" means) - min, max, avg

Total query execution time might not be helpful in the trickiest cases. =
What if you have multiple joins in your query? How do I know which one =
contributes to the execution most?

So, I would do a breakdown having total time, map time, per-join time, =
reduce time. Hope it=E2=80=99s possible. If it=E2=80=99s unclear how to =
support everything at the first place then it=E2=80=99s a different =
question. Let=E2=80=99s create several tickets and start implementing =
everything gracefully.

> 3) Number of bytes exchanged between nodes during query execution

It will be really helpful to make a breakdown showing a number of bytes =
exchanged per-join (physical join). Again, if you believe it makes sense =
to do the breakdown later then let=E2=80=99s create an additional ticket =
for the 2 tier metrics then.=20

> 4) Number of returned rows - min, max, avg


Plus, let=E2=80=99s add the following to the list:

5) Collocated: yes or no

=E2=80=94
Denis

> On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <vozerov@gridgain.com> =
wrote:
>=20
> Denis,
>=20
> The main problem with suggested metrics is that they implies that =
ceratin
> internal mechanics work in predefined way. For example, what is JOIN
> metrics? There are no guarantees that JOIN in user's query will be
> translated to a real physical join. What if several different query
> execution pieces happen in parallel? What if we rework our distributed
> query engine from pull to push approach for performance reasons and =
there
> will be no JOINs in classical sense?
>=20
> This is why I think that we should start with very basic things. =
Something
> like:
> 1) Query exec count
> 2) Query exec time (first define what "time" means) - min, max, avg
> 3) Number of bytes exchanged between nodes during query execution
> 4) Number of returned rows - min, max, avg
>=20
> Once we have base numbers in place, we can think of carefully =
integrating
> and enhancing all pieces of query execution into more verbose formats,
> similar to query plans with relative weights in classical RDBMS =
systems.
>=20
> Thoughts?
>=20
>=20
>=20
> On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <dmagda@apache.org> =
wrote:
>=20
>> Vovan,
>>=20
>> Your metrics make perfect sense to me. However, I see a high demand =
for
>> JOINs based metrics especially from those who give a try to =
non-collocated
>> joins in production  and want to measure them somehow. This is why,
>> personally, I prefer to see the metrics below in the top priority =
list as
>> well:
>>=20
>> if a query was executed in the collocated or non-collocated mode. =
Three
>> results are valid: collocated, non-collocated, simple query (no =
joins).
>> non-collocated query: size of the data exchanged between the nodes to
>> complete a specific join. If there are multiple joins in the query we =
need
>> to provide this metric for every of them.
>> non-collocated and collocated query: a part of the time spent joining =
the
>> data. If there are multiple joins in the query we need to provide =
this
>> metric for every of them.
>>=20
>> As for =E2=80=9Cunicast=E2=80=9D and =E2=80=9Cbroadcast=E2=80=9D, =
agree, let=E2=80=99s ignore it for now.
>>=20
>> In any case, can we include timing information (map phase, reduce =
phase,
>> join phase) into an execution plan produced by H2? Are there any
>> implementation hooks?
>>=20
>> =E2=80=94
>> Denis
>>=20
>>=20
>>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan =
<dsetrakyan@apache.org>
>> wrote:
>>>=20
>>> I think some of the metrics specified by Denis also make sense, so I
>> would
>>> add them as well. See below...
>>>=20
>>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov =
<vozerov@gridgain.com
>> <mailto:vozerov@gridgain.com>>
>>> wrote:
>>>=20
>>>> Denis,
>>>>=20
>>>> Query execution is complex process involving different stages which =
are
>> not
>>>> very easy to match with each other. Especially provided that any =
node
>> can
>>>> leave topology at any time. Another problem is that engine evolves =
and
>>>> metrics like "did a query do broadcast or unicast" may easily =
become
>>>> useless at some point, because for example there will be neither
>> unicast,
>>>> nor broadast, but something different. On the other hand I =
completely
>> agree
>>>> that performance monitoring is essential part of any mature DBMS.
>>>>=20
>>>> I would start with metrics which are both very basic and easy to
>> implement
>>>> at the same time. For example we can add fingerprint (hash) to =
every
>> query
>>>> which will be used to join "map" and "reduce" parts with each other =
and
>> add
>>>> the following basic metrics:
>>>> 1) Execution count for particular query
>>>> 2) Number of map nodes - min, max, avg
>>>>=20
>>>=20
>>> (1) and (2) makes sense
>>>=20
>>>=20
>>>> 3) Map step duration (if applicable) - min, max,
>>>=20
>>> 4) Reduce step duration (if applicable) - min, max, avg
>>>>=20
>>>=20
>>> Not sure if (3) and (4) are needed. I would only add them if they =
are
>> easy
>>> to implement.
>>>=20
>>> I would also add these:
>>>=20
>>> 5) Collocated: yes/no
>>> 6) last execution time
>>> 7) min/max/average execution duration
>>>=20
>>>=20
>>>>=20
>>>> Once done users will be able to get statistics for particular =
queries.
>>>>=20
>>>> Vladimir.
>>>>=20
>>>>=20
>>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <dmagda@apache.org> =
wrote:
>>>>=20
>>>>> BTW,
>>>>>=20
>>>>> What if we expose per-query metrics below as a part of EXPLAIN =
ANALYZE?
>>>>> Sergi, is this feasible?
>>>>>=20
>>>>> =E2=80=94
>>>>> Denis
>>>>>=20
>>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <dmagda@apache.org> =
wrote:
>>>>>>=20
>>>>>> Igniters,
>>>>>>=20
>>>>>> Let=E2=80=99s shed more light on SQL query execution internals =
introducing a
>>>> set
>>>>> of useful metrics =
(https://issues.apache.org/jira/browse/IGNITE-4757).
>>>>>>=20
>>>>>> Per-query metrics. Total history size is defined by
>>>> *CacheConfiguration.
>>>>> getQueryDetailMetricsSize*:
>>>>>> * if a query was executed in the collocated or non-collocated =
mode.
>>>>> Three results are valid: collocated, non-collocated, simple query =
(no
>>>>> joins).
>>>>>> * non-collocated query: size of the data exchanged between the =
nodes
>> to
>>>>> complete a join.
>>>>>> * non-collocated query: did a query do broadcast or unicast to =
get
>> data
>>>>> needed to complete a join.
>>>>>> * non-collocated and collocated query: a part of the time spent
>> joining
>>>>> the data.
>>>>>>=20
>>>>>> CacheMetrics:
>>>>>> * an average number of executed SQL queries (collocated,
>>>> non-collocated,
>>>>> simple query (no joins)).
>>>>>>=20
>>>>>> Please don=E2=80=99t hesitate do share suggest another metrics or =
improve
>>>>> proposed ones.
>>>>>>=20
>>>>>> =E2=80=94
>>>>>> Denis
>>=20
>>=20