ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Юрий <jury.gerzhedow...@gmail.com>
Subject Re: Query history statistics API
Date Fri, 21 Dec 2018 16:23:48 GMT
Vladimir, thanks for your expert opinion.

I have some thoughts about 5 point.
I tried to find how it works for Oracle and PG:

*PG*: keep by default 1000 (can be configured) statements without and
discard the least-executed statements. Update statistics is asynchronous
process and statistics may have lag.

*Oracle*: use shared pool for historical data and can evict records with
min time of last execution in case free space at shared pool is not enough
for a data which can be related not only historical statistics. So seems
also separate asynchronous process (information about it so small).


Unfortunately I could not find information about big workload and how it
handled for these databases. However We could see that both of vendors use
asynchronous statistic processing.


I see few variants how we can handle very high workload.

First part of variants use asynchronous model with separate thread which
should take elements to update stats from a queue:
1) We blocking on overlimited queue and wait when capacity will be enough
to put new element.

+ We have all actual statistics
- End of our query execution can be blocked.

2) Discard statistics for ended query in case queue is full.

+ Very fast for current query
- We lose part of statistics.

3) Do full clean of statistic's queue.

+ Fast and freespace for further elements
- We lose big number of statistic elements.


Second part of variants use current approach for queryMetrics. When we have
some additional capacity for CHM with history + periodical cleanup the Map.
In case even the additional space is not enough we can :
1) Discard statistics for ended query
2) Do full clean CHM and discard all gathered information.

First part of variants potentially should work faster due to we can update
history Map in single thread without contention and put to queue should be
faster.


What do you think? Which of the variant will be prefer or may be you can
suggest another way to handle potential huge workload?

Also there is one initial question which stay not clear to me - it is right
place for new API.


пт, 21 дек. 2018 г. в 13:05, Vladimir Ozerov <vozerov@gridgain.com>:

> Hi,
>
> I'd propose the following approach:
> 1) Enable history by default. Becuase otherwise users will have to restart
> the node to enable it, or we will have to implement dynamic history enable,
> which is complex thing. Default value should be relatively small yet
> allowing to accommodate typical workloads. E.g. 1000 entries. This should
> not put any serious pressure to GC.
> 2) Split queries by: schema, query, local flag
> 3) Track only growing values: execution count, error count, minimum
> duration, maximum duration
> 4) Implement ability to clear history - JMX, SQL command, whatever (may be
> this is different ticket)
> 5) History cleanup might be implemented similarly to current approach:
> store everything in CHM. Periodically check it's size. If it is too big -
> evict oldest entries. But this should be done with care - under some
> workloads new queries will be generated very quickly. In this case we
> should either fallback to synchronous evicts, or do not log history at all.
>
> Thoughts?
>
> Vladimir.
> -
>
> On Fri, Dec 21, 2018 at 11:22 AM Юрий <jury.gerzhedowich@gmail.com> wrote:
>
> > Alexey,
> >
> > Yes, such property to configuration history size will be added. I think
> > default value should be 0 and history by default shouldn't be gather at
> > all, and can be switched on by property in case when it required.
> >
> > Currently I planned use the same way to evicting old data as for
> > queryMetrics - scheduled task will evict will old data by oldest start
> time
> > of query.
> >
> > Will be gathered statistics for only initial clients queries, so internal
> > queries will not including. For the same queries we will have one record
> in
> > history with merged statistics.
> >
> > All above points just my proposal. Please revert back in case you think
> > anything should be implemented by another way.
> >
> >
> >
> >
> >
> > чт, 20 дек. 2018 г. в 18:23, Alexey Kuznetsov <akuznetsov@apache.org>:
> >
> > > Yuriy,
> > >
> > > I have several questions:
> > >
> > > Are we going to add some properties to cluster configuration for
> history
> > > size?
> > >
> > > And what will be default history size?
> > >
> > > Will the same queries count as same item of historical data?
> > >
> > > How we will evict old data that not fit into history?
> > >
> > > Will we somehow count "reduce" queries? Or only final "map" ones?
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
> >
> > --
> > Живи с улыбкой! :D
> >
>


-- 
Живи с улыбкой! :D

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message