ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: SQL query CPU utilization too low.
Date Wed, 30 Nov 2016 14:23:53 GMT
It looks like we can't just split sql query to several threads due to H2
limitations.
We can bound query thread with certain set of partitions, but, actually, H2
will read whole index and then filter entries regarding its partition. So,
we can get significant speed-up that way.

Unfortunatelly, H2 does not support sharding, and we need to have a
workaround. We can try to split indices, so each query thread would be
bounded with its own index part.
I've implemented such prototype and get significant speed up with single
node grid as if it was several node grid.
Due to H2 knows nothing about splitted indices, we must bother about every
query should be run as TwoStepQuery and utilize all table index parts.

As index creation on demand is very heavy operation, index should be
splitted when it is created. So we can set parallelizm level on per-cache
base but not per-query.

Another issue I've faced is that our implementation of prepared statement
cache is useless with splitted indices. Prepared statement cached  in
thread local variable and it seems that the statement is bounded with
certain index part. So if we reuse same statement for different index parts
we will get unexpected results.

On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <dsetrakyan@apache.org>
wrote:

> Completely agree, great point!
>
> On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <sergi.vladykin@gmail.com>
> wrote:
>
> > I think it must be a maximum local parallelism level but not just `on`
> and
> > `off` setting (the default is obviously 1). This along with separately
> > configurable query thread pool will give a finer grained control over
> > resources.
> >
> > Sergi
> >
> > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
> >
> > > I already mentioned this in another email, but we should be able to
> turn
> > > this property on and off on per-query and per-cache levels.
> > >
> > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > sergi.vladykin@gmail.com
> > > >
> > > wrote:
> > >
> > > > Agree, lets implement such a parallelization.
> > > >
> > > > I think we will need an explicit setting for SqlQuery and
> > SqlFieldsQuery,
> > > > the default behavior should not change.
> > > >
> > > > Sergi
> > > >
> > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <amashenkov@gridgain.com
> >:
> > > >
> > > > > So, now we have every SQL query run on each node in single thread.
> > This
> > > > can
> > > > > be an issue for heavy queries or queries running on big data sets,
> > e.g.
> > > > > analytical queries.
> > > > >
> > > > > For now, the only way to speed up such queries is to add more nodes
> > to
> > > > grid
> > > > > running on same server. In this case, data will be partitioned over
> > all
> > > > > these nodes and query will be split and run on all nodes.
> > > > >
> > > > > It seems, we can have a benefit if split SQL queries locally as we
> do
> > > it
> > > > > across nodes with TwoStepQuery.
> > > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > >
> >
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message