ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Pavlukhin <vololo...@gmail.com>
Subject Re: Calcite based SQL query engine. Local queries
Date Wed, 13 Nov 2019 07:32:57 GMT
Dmitriy,

Would be great if you can describe your use-case in more details,
might be sharing a code it the best option here.

Denis,

Yep, the idea of mixing up Compute, SQL, KV APIs in a super weapon
sounds as a killer feature. But I have a great deal of doubt that it
is not over-complex to use such tool properly in practice. Partition
reservation is not obvious with Ignite compute, KV API can be
transactional but SQL not and so on. Too many pitfalls.

пт, 8 нояб. 2019 г. в 16:50, Denis Magda <dmagda@apache.org>:
>
> Take the amount of cashback calculation or payments authorization as
> examples of compute tasks with local SQL. In the first case, all
> transactions are collocated per account and a bank needs to calculate the
> cashback monthly by broadcasting the task that executes special logic
> across all accounts and SQL is used by the logic to access the data with
> various filters. The same is done for an individual account with an
> affinity call. In the second case, a man swipes a card at a shop register,
> systems sends a compute task to the node that collocates a lot of data per
> the man account and begins calculating hundreds or thousands of variables
> retrieving data with both key-value and SQL.
>
> Also, take drugs discovery and other pharmaceutical examples. Those are
> compute-heavy and the users from that space were sharing the stories how
> compute, scan, sql and key-value apis are used together with compute.
>
> At all, each industry has compute-heavy use cases that need to retrieve
> local data with local SQL, there are real Ignite users who do this in prod.
> Again, we also need to think about our compute as of advanced stored and
> complex procedures that can retrieve local/collocated data not only with
> key-value and scans but with SQL as well that supports conditions, joins,
> etc.
>
> Denis
>
> On Thursday, November 7, 2019, Ivan Pavlukhin <vololo100@gmail.com> wrote:
>
> > Denis,
> >
> > To make things really clearer we need to provide some concrete example
> > of Compute + LocalSQL and reason about it to figure out whether
> > "smart" SQL engine can deliver the same (or better) results or not.
> >
> > пт, 8 нояб. 2019 г. в 01:48, Denis Magda <dmagda@apache.org>:
> > >
> > > Folks,
> > >
> > > See our compute tasks as an advanced version of stored procedures that
> > let
> > > the users code the logic of various complexity with Java, .NET or C++
> > (and
> > > not with PL/SQL). The logic can use a combination of APIs (key-value,
> > SQL,
> > > etc.) to access data both locally and remotely while being executed on
> > > server nodes. The logic can make N key-value requests or run M SQL
> > queries.
> > >
> > > We kept supporting local SQL queries exactly for such scenarios (for our
> > > version of stored procedures) to ensure the distributed map-reduce phase
> > is
> > > canceled if all the data is local. And affinityCalls were improved one
> > day
> > > to pin the partitions.
> > >
> > > If the new engine is smart enough to understand that all the partitions
> > are
> > > available locally during the affinityRun execution then it's totally fine
> > > to remove the 'local' flag. Otherwise, we need to instruct the engine
> > > manually that a distributed phase is redundant via 'local' flag or by
> > other
> > > means.
> > >
> > > Does it make things clearer?
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <vololo100@gmail.com>
> > wrote:
> > >
> > > > Stephen,
> > > >
> > > > In my understanding we need to do a better job to realize use-cases of
> > > > Compute + LocalSQL ourselves.
> > > >
> > > > Ideally smart optimizer should do the best job of query deployment.
> > > >
> > > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> > > > <stephen.darlington@gridgain.com>:
> > > > >
> > > > > I made a (bad) assumption that this would also affect queries against
> > > > partitions. If “setLocal()” goes away but “setPartitions()” remains
I’m
> > > > happy.
> > > > >
> > > > > What I would say is that the “broadcast / local” method is one
I see
> > > > fairly often. Do we need to do a better job educating people of the
> > > > “correct” way?
> > > > >
> > > > > Regards,
> > > > > Stephen
> > > > >
> > > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Denis, Stephen,
> > > > > >
> > > > > > Running a local query in a broadcast closure won't work on changing
> > > > > > topology. We specifically added an affinityCall method to the
> > compute
> > > > API
> > > > > > in order to pin a partition to prevent its moving and eviction
> > > > throughout
> > > > > > the task execution. Therefore, the query inside an affinityCall
is
> > > > always
> > > > > > executed against some partitions (otherwise the query may give
> > > > incorrect
> > > > > > results when topology is changed).
> > > > > >
> > > > > > I support Igor's question and think that the 'local' flag for
the
> > query
> > > > > > should be deprecated and eventually removed. A 'local' query
can
> > > > always be
> > > > > > expressed as a query agains a set of partitions. If those
> > partitions
> > > > are
> > > > > > located on the same node - good, we get fast and correct results.
> > If
> > > > not -
> > > > > > we may either raise an exception and ask user to remap the query,
> > or
> > > > > > fallback to a distributed query execution.
> > > > > >
> > > > > > Given that the Calcite prototype is in its early stages, it's
> > likely
> > > > its
> > > > > > first version will be available in 3.x, and it's a good chance
to
> > get
> > > > rid
> > > > > > of wrong API pieces.
> > > > > >
> > > > > > --AG
> > > > > >
> > > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > > > > stephen.darlington@gridgain.com>:
> > > > > >
> > > > > >> A common use case is where you want to work on many rows
of data
> > > > across
> > > > > >> the grid. You’d broadcast a closure, running the same
code on
> > every
> > > > node
> > > > > >> with just the local data. SQL doesn’t work in isolation
— it’s
> > often
> > > > used
> > > > > >> as a filter for future computations.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Stephen
> > > > > >>
> > > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo100@gmail.com>
> > wrote:
> > > > > >>>
> > > > > >>> Denis,
> > > > > >>>
> > > > > >>> I am mostly concerned about gathering use cases. It
would be
> > great to
> > > > > >>> critically assess such cases to identify why it cannot
be solved
> > by
> > > > > >>> using distributed SQL. Also it sounds similar to some
kind of
> > > > "hints",
> > > > > >>> but very limited and with all hints drawbacks (impossibility
to
> > use
> > > > > >>> full strength of CBO). We can provide better "hints"
support
> > with new
> > > > > >>> engine as well.
> > > > > >>>
> > > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dmagda@apache.org>:
> > > > > >>>>
> > > > > >>>> Ivan,
> > > > > >>>>
> > > > > >>>> I was involved in a couple of such use cases personally,
so,
> > that's
> > > > not
> > > > > >> my
> > > > > >>>> imagination ;) Even more, as far as I remember,
the primary
> > reason
> > > > why
> > > > > >> we
> > > > > >>>> improved our affinityRuns ensuring no partition
is purged from a
> > > > node
> > > > > >> until
> > > > > >>>> a task is completed is because many users were running
local SQL
> > > > from
> > > > > >>>> compute tasks and needed a guarantee that SQL will
always
> > return a
> > > > > >> correct
> > > > > >>>> result set.
> > > > > >>>>
> > > > > >>>> -
> > > > > >>>> Denis
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <
> > vololo100@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>>>
> > > > > >>>>> Denis,
> > > > > >>>>>
> > > > > >>>>> Would be nice to see real use-cases of affinity
call + local
> > SQL
> > > > > >>>>> combination. Generally, new engine will be able
to infer
> > > > collocation
> > > > > >>>>> resulting in the same collocated execution automatically.
> > > > > >>>>>
> > > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda
<dmagda@apache.org>:
> > > > > >>>>>>
> > > > > >>>>>> Hi Igor,
> > > > > >>>>>>
> > > > > >>>>>> Local queries feature is broadly used together
with
> > affinity-based
> > > > > >>>>> compute
> > > > > >>>>>> tasks:
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>
> > > > https://apacheignite.readme.io/docs/collocate-compute-and-
> > data#section-affinity-call-and-run-methods
> > > > > >>>>>>
> > > > > >>>>>> The use case is as follows. The user knows
that all required
> > data
> > > > > >> needed
> > > > > >>>>>> for computation is collocated, and SQL is
used as an advanced
> > API
> > > > for
> > > > > >>>>> data
> > > > > >>>>>> retrieval from the computation code. The
affinity task ensures
> > > > that
> > > > > >>>>>> partitions won't be discarded from the node(s)
if the topology
> > > > changes
> > > > > >>>>>> during the task execution and, thus, it's
safe to run SQL
> > locally
> > > > > >>>>> skipping
> > > > > >>>>>> distributed phases.
> > > > > >>>>>>
> > > > > >>>>>> The combination of affinity compute tasks
with local SQL is a
> > > > real and
> > > > > >>>>>> valuable use case, and this is what we need
to support with
> > > > Calcite.
> > > > > >> Do
> > > > > >>>>> you
> > > > > >>>>>> see any challenges?
> > > > > >>>>>>
> > > > > >>>>>> -
> > > > > >>>>>> Denis
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > > > > >> <kondakov87@mail.ru.invalid
> > > > > >>>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Hi Igor!
> > > > > >>>>>>>
> > > > > >>>>>>> IMO we need to maintain the backward
compatibility between
> > old
> > > > and
> > > > > >> new
> > > > > >>>>>>> query engines as much as possible. And
therefore we shouldn't
> > > > change
> > > > > >>>>> the
> > > > > >>>>>>> behavior of local queries.
> > > > > >>>>>>>
> > > > > >>>>>>> So, for local queries Calcite's planner
shouldn't consider
> > the
> > > > > >>>>>>> distribution trait at all.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Kind Regards
> > > > > >>>>>>> Roman Kondakov
> > > > > >>>>>>>
> > > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor
wrote:
> > > > > >>>>>>>> Hi Igniters,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Working on new generation of Ignite
SQL I faced a question:
> > «Do
> > > > we
> > > > > >>>>> need
> > > > > >>>>>>> local queries at all and, if so, what
semantic they should
> > > > have?».
> > > > > >>>>>>>>
> > > > > >>>>>>>> Current planing flow consists of
next steps:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1) Parsing SQL to AST
> > > > > >>>>>>>> 2) Validating AST (against Schema)
> > > > > >>>>>>>> 3) Optimizing (Building execution
graph)
> > > > > >>>>>>>> 4) Splitting (into query fragments
which executes on target
> > > > nodes)
> > > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > > > > >>>>>>>>
> > > > > >>>>>>>> At last step we check that all Fragment
sources (a table or
> > > > result)
> > > > > >>>>> have
> > > > > >>>>>>> the same distribution (in other words
all sources have to be
> > > > > >>>>> co-located)
> > > > > >>>>>>>>
> > > > > >>>>>>>> Planner and Splitter guarantee that
all caches in a
> > Fragment are
> > > > > >>>>>>> co-located, an Exchange is produced
otherwise. But if we
> > force
> > > > local
> > > > > >>>>>>> execution we cannot produce Exchanges,
that means we may
> > face two
> > > > > >>>>>>> non-co-located caches inside a single
query fragment (result
> > of
> > > > local
> > > > > >>>>> query
> > > > > >>>>>>> planning is a single query fragment).
So, we cannot pass the
> > > > check.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Should we throw an exception or
omit the check for local
> > query
> > > > > >>>>> planning
> > > > > >>>>>>> or prohibit local queries at all?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Your thoughts?
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards,
> > > > > >>>>>>>> Igor
> > > > > >>>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>> Best regards,
> > > > > >>>>> Ivan Pavlukhin
> > > > > >>>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Best regards,
> > > > > >>> Ivan Pavlukhin
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >
>
>
> --
> -
> Denis



-- 
Best regards,
Ivan Pavlukhin

Mime
View raw message