ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: SQL query CPU utilization too low.
Date Mon, 05 Dec 2016 11:36:12 GMT
Copy from Review comment
>Sergi: Another thing is how we will handle case if different caches in
join have different parallelism level?
Good question, Sergi. It seems we can't handle it.

I've a crazy idea and not sure it is workable.
What if we would split indices to power of 2 number of segments (it can be
configured per cache).
Lets queries to be splitted to power of 2 number of threads, but number of
query threads should be less or equal number of segments size.

If query involve indices with different number of segments, we should have
some way to map thread to indices.
It looks to be easy if we would be able to wrap pairs of indices into
single object to align indices number.

E.g. lets we have Table1 with parallelizm level of 8 and Table2 with
parallelizm level of 4. Then we would be able to run 4 threads where each
thread would be run on 1 segment of Table2 index and wrapped pair of index
of Table1.

Thoughts?

On Wed, Nov 30, 2016 at 6:31 PM, Sergi Vladykin <sergi.vladykin@gmail.com>
wrote:

> Cool! I'll take a look today.
>
> Sergi
>
> 2016-11-30 18:23 GMT+03:00 Andrey Mashenkov <andrey.mashenkov@gmail.com>:
>
> > Serj,  you can see a PR attached to jira issue [1], that can be opened
> with
> > upsource [2].
> >
> > Tanks, I remember about distributed queries and wiil rework them right
> > after we come to agreemant that the solution for simple queries is ok.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-4106
> > [2] http://reviews.ignite.apache.org/ignite/review/IGNT-CR-15
> >
> >
> >
> > On Wed, Nov 30, 2016 at 5:34 PM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> > > Per cache SQL parallelism level looks reasonable to me here.
> > >
> > > I'm not sure what do you mean about "prepared statement cache is
> useless
> > > with splitted indices", most probably you parallelize queries in some
> > wrong
> > > way if this is true.
> > >
> > > Also do not forget about distributed joins: with parallel queries on
> the
> > > same node we will need to make index range requests not only to remote
> > > nodes, but to query contexts in parallel threads on the same local node
> > as
> > > well.
> > >
> > > Sergi
> > >
> > > 2016-11-30 17:23 GMT+03:00 Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> > >:
> > >
> > > > It looks like we can't just split sql query to several threads due to
> > H2
> > > > limitations.
> > > > We can bound query thread with certain set of partitions, but,
> > actually,
> > > H2
> > > > will read whole index and then filter entries regarding its
> partition.
> > > So,
> > > > we can get significant speed-up that way.
> > > >
> > > > Unfortunatelly, H2 does not support sharding, and we need to have a
> > > > workaround. We can try to split indices, so each query thread would
> be
> > > > bounded with its own index part.
> > > > I've implemented such prototype and get significant speed up with
> > single
> > > > node grid as if it was several node grid.
> > > > Due to H2 knows nothing about splitted indices, we must bother about
> > > every
> > > > query should be run as TwoStepQuery and utilize all table index
> parts.
> > > >
> > > > As index creation on demand is very heavy operation, index should be
> > > > splitted when it is created. So we can set parallelizm level on
> > per-cache
> > > > base but not per-query.
> > > >
> > > > Another issue I've faced is that our implementation of prepared
> > statement
> > > > cache is useless with splitted indices. Prepared statement cached  in
> > > > thread local variable and it seems that the statement is bounded with
> > > > certain index part. So if we reuse same statement for different index
> > > parts
> > > > we will get unexpected results.
> > > >
> > > > On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <
> > > dsetrakyan@apache.org>
> > > > wrote:
> > > >
> > > > > Completely agree, great point!
> > > > >
> > > > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <
> > > > sergi.vladykin@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I think it must be a maximum local parallelism level but not
just
> > > `on`
> > > > > and
> > > > > > `off` setting (the default is obviously 1). This along with
> > > separately
> > > > > > configurable query thread pool will give a finer grained control
> > over
> > > > > > resources.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >:
> > > > > >
> > > > > > > I already mentioned this in another email, but we should
be
> able
> > to
> > > > > turn
> > > > > > > this property on and off on per-query and per-cache levels.
> > > > > > >
> > > > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > > > > > sergi.vladykin@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Agree, lets implement such a parallelization.
> > > > > > > >
> > > > > > > > I think we will need an explicit setting for SqlQuery
and
> > > > > > SqlFieldsQuery,
> > > > > > > > the default behavior should not change.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <
> > > > amashenkov@gridgain.com
> > > > > >:
> > > > > > > >
> > > > > > > > > So, now we have every SQL query run on each node
in single
> > > > thread.
> > > > > > This
> > > > > > > > can
> > > > > > > > > be an issue for heavy queries or queries running
on big
> data
> > > > sets,
> > > > > > e.g.
> > > > > > > > > analytical queries.
> > > > > > > > >
> > > > > > > > > For now, the only way to speed up such queries
is to add
> more
> > > > nodes
> > > > > > to
> > > > > > > > grid
> > > > > > > > > running on same server. In this case, data will
be
> > partitioned
> > > > over
> > > > > > all
> > > > > > > > > these nodes and query will be split and run on
all nodes.
> > > > > > > > >
> > > > > > > > > It seems, we can have a benefit if split SQL
queries
> locally
> > as
> > > > we
> > > > > do
> > > > > > > it
> > > > > > > > > across nodes with TwoStepQuery.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > С уважением,
> > > > Машенков Андрей Владимирович
> > > > Тел. +7-921-932-61-82
> > > >
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > > Cerr: +7-921-932-61-82
> > > >
> > >
> >
> >
> >
> > --
> > С уважением,
> > Машенков Андрей Владимирович
> > Тел. +7-921-932-61-82
> >
> > Best regards,
> > Andrey V. Mashenkov
> > Cerr: +7-921-932-61-82
> >
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message