ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Pavlov <dpav...@apache.org>
Subject Re: Calcite based SQL query engine. Local queries
Date Fri, 08 Nov 2019 12:27:07 GMT
Hi Ivan, Igniters, imagine you need to scan all entities in the cluster.

Ideally, you don't want to de-serialize all of entries, so you can use
withKeepBinary(). e.g. you need a couple of fields and get some cumulative
metric on this data. You can send compute to all cluster nodes and run
there SQL scan queries with local mode is on. In that manner you can
implement Map-Reduce.

It may be there is another way of doing that, so I encourage to share it. I
could update workshops/training I preparing in background.

Sincerely,
Dmitriy Pavlov

пт, 8 нояб. 2019 г. в 08:57, Ivan Pavlukhin <vololo100@gmail.com>:

> Denis,
>
> To make things really clearer we need to provide some concrete example
> of Compute + LocalSQL and reason about it to figure out whether
> "smart" SQL engine can deliver the same (or better) results or not.
>
> пт, 8 нояб. 2019 г. в 01:48, Denis Magda <dmagda@apache.org>:
> >
> > Folks,
> >
> > See our compute tasks as an advanced version of stored procedures that
> let
> > the users code the logic of various complexity with Java, .NET or C++
> (and
> > not with PL/SQL). The logic can use a combination of APIs (key-value,
> SQL,
> > etc.) to access data both locally and remotely while being executed on
> > server nodes. The logic can make N key-value requests or run M SQL
> queries.
> >
> > We kept supporting local SQL queries exactly for such scenarios (for our
> > version of stored procedures) to ensure the distributed map-reduce phase
> is
> > canceled if all the data is local. And affinityCalls were improved one
> day
> > to pin the partitions.
> >
> > If the new engine is smart enough to understand that all the partitions
> are
> > available locally during the affinityRun execution then it's totally fine
> > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > manually that a distributed phase is redundant via 'local' flag or by
> other
> > means.
> >
> > Does it make things clearer?
> >
> >
> > -
> > Denis
> >
> >
> > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <vololo100@gmail.com>
> wrote:
> >
> > > Stephen,
> > >
> > > In my understanding we need to do a better job to realize use-cases of
> > > Compute + LocalSQL ourselves.
> > >
> > > Ideally smart optimizer should do the best job of query deployment.
> > >
> > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > <stephen.darlington@gridgain.com>:
> > > >
> > > > I made a (bad) assumption that this would also affect queries against
> > > partitions. If “setLocal()” goes away but “setPartitions()” remains
I’m
> > > happy.
> > > >
> > > > What I would say is that the “broadcast / local” method is one I see
> > > fairly often. Do we need to do a better job educating people of the
> > > “correct” way?
> > > >
> > > > Regards,
> > > > Stephen
> > > >
> > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> > > wrote:
> > > > >
> > > > > Denis, Stephen,
> > > > >
> > > > > Running a local query in a broadcast closure won't work on changing
> > > > > topology. We specifically added an affinityCall method to the
> compute
> > > API
> > > > > in order to pin a partition to prevent its moving and eviction
> > > throughout
> > > > > the task execution. Therefore, the query inside an affinityCall is
> > > always
> > > > > executed against some partitions (otherwise the query may give
> > > incorrect
> > > > > results when topology is changed).
> > > > >
> > > > > I support Igor's question and think that the 'local' flag for the
> query
> > > > > should be deprecated and eventually removed. A 'local' query can
> > > always be
> > > > > expressed as a query agains a set of partitions. If those
> partitions
> > > are
> > > > > located on the same node - good, we get fast and correct results.
> If
> > > not -
> > > > > we may either raise an exception and ask user to remap the query,
> or
> > > > > fallback to a distributed query execution.
> > > > >
> > > > > Given that the Calcite prototype is in its early stages, it's
> likely
> > > its
> > > > > first version will be available in 3.x, and it's a good chance to
> get
> > > rid
> > > > > of wrong API pieces.
> > > > >
> > > > > --AG
> > > > >
> > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > stephen.darlington@gridgain.com>:
> > > > >
> > > > >> A common use case is where you want to work on many rows of data
> > > across
> > > > >> the grid. You’d broadcast a closure, running the same code
on
> every
> > > node
> > > > >> with just the local data. SQL doesn’t work in isolation —
it’s
> often
> > > used
> > > > >> as a filter for future computations.
> > > > >>
> > > > >> Regards,
> > > > >> Stephen
> > > > >>
> > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo100@gmail.com>
> wrote:
> > > > >>>
> > > > >>> Denis,
> > > > >>>
> > > > >>> I am mostly concerned about gathering use cases. It would
be
> great to
> > > > >>> critically assess such cases to identify why it cannot be
solved
> by
> > > > >>> using distributed SQL. Also it sounds similar to some kind
of
> > > "hints",
> > > > >>> but very limited and with all hints drawbacks (impossibility
to
> use
> > > > >>> full strength of CBO). We can provide better "hints" support
> with new
> > > > >>> engine as well.
> > > > >>>
> > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dmagda@apache.org>:
> > > > >>>>
> > > > >>>> Ivan,
> > > > >>>>
> > > > >>>> I was involved in a couple of such use cases personally,
so,
> that's
> > > not
> > > > >> my
> > > > >>>> imagination ;) Even more, as far as I remember, the primary
> reason
> > > why
> > > > >> we
> > > > >>>> improved our affinityRuns ensuring no partition is purged
from a
> > > node
> > > > >> until
> > > > >>>> a task is completed is because many users were running
local SQL
> > > from
> > > > >>>> compute tasks and needed a guarantee that SQL will always
> return a
> > > > >> correct
> > > > >>>> result set.
> > > > >>>>
> > > > >>>> -
> > > > >>>> Denis
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> vololo100@gmail.com
> > > >
> > > > >> wrote:
> > > > >>>>
> > > > >>>>> Denis,
> > > > >>>>>
> > > > >>>>> Would be nice to see real use-cases of affinity call
+ local
> SQL
> > > > >>>>> combination. Generally, new engine will be able to
infer
> > > collocation
> > > > >>>>> resulting in the same collocated execution automatically.
> > > > >>>>>
> > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda
<dmagda@apache.org>:
> > > > >>>>>>
> > > > >>>>>> Hi Igor,
> > > > >>>>>>
> > > > >>>>>> Local queries feature is broadly used together
with
> affinity-based
> > > > >>>>> compute
> > > > >>>>>> tasks:
> > > > >>>>>>
> > > > >>>>>
> > > > >>
> > >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > > > >>>>>>
> > > > >>>>>> The use case is as follows. The user knows that
all required
> data
> > > > >> needed
> > > > >>>>>> for computation is collocated, and SQL is used
as an advanced
> API
> > > for
> > > > >>>>> data
> > > > >>>>>> retrieval from the computation code. The affinity
task ensures
> > > that
> > > > >>>>>> partitions won't be discarded from the node(s)
if the topology
> > > changes
> > > > >>>>>> during the task execution and, thus, it's safe
to run SQL
> locally
> > > > >>>>> skipping
> > > > >>>>>> distributed phases.
> > > > >>>>>>
> > > > >>>>>> The combination of affinity compute tasks with
local SQL is a
> > > real and
> > > > >>>>>> valuable use case, and this is what we need to
support with
> > > Calcite.
> > > > >> Do
> > > > >>>>> you
> > > > >>>>>> see any challenges?
> > > > >>>>>>
> > > > >>>>>> -
> > > > >>>>>> Denis
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > >> <kondakov87@mail.ru.invalid
> > > > >>>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hi Igor!
> > > > >>>>>>>
> > > > >>>>>>> IMO we need to maintain the backward compatibility
between
> old
> > > and
> > > > >> new
> > > > >>>>>>> query engines as much as possible. And therefore
we shouldn't
> > > change
> > > > >>>>> the
> > > > >>>>>>> behavior of local queries.
> > > > >>>>>>>
> > > > >>>>>>> So, for local queries Calcite's planner shouldn't
consider
> the
> > > > >>>>>>> distribution trait at all.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Kind Regards
> > > > >>>>>>> Roman Kondakov
> > > > >>>>>>>
> > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > > > >>>>>>>> Hi Igniters,
> > > > >>>>>>>>
> > > > >>>>>>>> Working on new generation of Ignite SQL
I faced a question:
> «Do
> > > we
> > > > >>>>> need
> > > > >>>>>>> local queries at all and, if so, what semantic
they should
> > > have?».
> > > > >>>>>>>>
> > > > >>>>>>>> Current planing flow consists of next
steps:
> > > > >>>>>>>>
> > > > >>>>>>>> 1) Parsing SQL to AST
> > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > >>>>>>>> 3) Optimizing (Building execution graph)
> > > > >>>>>>>> 4) Splitting (into query fragments which
executes on target
> > > nodes)
> > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > >>>>>>>>
> > > > >>>>>>>> At last step we check that all Fragment
sources (a table or
> > > result)
> > > > >>>>> have
> > > > >>>>>>> the same distribution (in other words all
sources have to be
> > > > >>>>> co-located)
> > > > >>>>>>>>
> > > > >>>>>>>> Planner and Splitter guarantee that all
caches in a
> Fragment are
> > > > >>>>>>> co-located, an Exchange is produced otherwise.
But if we
> force
> > > local
> > > > >>>>>>> execution we cannot produce Exchanges, that
means we may
> face two
> > > > >>>>>>> non-co-located caches inside a single query
fragment (result
> of
> > > local
> > > > >>>>> query
> > > > >>>>>>> planning is a single query fragment). So,
we cannot pass the
> > > check.
> > > > >>>>>>>>
> > > > >>>>>>>> Should we throw an exception or omit
the check for local
> query
> > > > >>>>> planning
> > > > >>>>>>> or prohibit local queries at all?
> > > > >>>>>>>>
> > > > >>>>>>>> Your thoughts?
> > > > >>>>>>>>
> > > > >>>>>>>> Regards,
> > > > >>>>>>>> Igor
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>> Best regards,
> > > > >>>>> Ivan Pavlukhin
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Best regards,
> > > > >>> Ivan Pavlukhin
> > > > >>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message