ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Pavlukhin <vololo...@gmail.com>
Subject Re: Calcite based SQL query engine. Local queries
Date Thu, 07 Nov 2019 11:53:04 GMT
Stephen,

In my understanding we need to do a better job to realize use-cases of
Compute + LocalSQL ourselves.

Ideally smart optimizer should do the best job of query deployment.

чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
<stephen.darlington@gridgain.com>:
>
> I made a (bad) assumption that this would also affect queries against partitions. If
“setLocal()” goes away but “setPartitions()” remains I’m happy.
>
> What I would say is that the “broadcast / local” method is one I see fairly often.
Do we need to do a better job educating people of the “correct” way?
>
> Regards,
> Stephen
>
> > On 7 Nov 2019, at 08:30, Alexey Goncharuk <alexey.goncharuk@gmail.com> wrote:
> >
> > Denis, Stephen,
> >
> > Running a local query in a broadcast closure won't work on changing
> > topology. We specifically added an affinityCall method to the compute API
> > in order to pin a partition to prevent its moving and eviction throughout
> > the task execution. Therefore, the query inside an affinityCall is always
> > executed against some partitions (otherwise the query may give incorrect
> > results when topology is changed).
> >
> > I support Igor's question and think that the 'local' flag for the query
> > should be deprecated and eventually removed. A 'local' query can always be
> > expressed as a query agains a set of partitions. If those partitions are
> > located on the same node - good, we get fast and correct results. If not -
> > we may either raise an exception and ask user to remap the query, or
> > fallback to a distributed query execution.
> >
> > Given that the Calcite prototype is in its early stages, it's likely its
> > first version will be available in 3.x, and it's a good chance to get rid
> > of wrong API pieces.
> >
> > --AG
> >
> > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > stephen.darlington@gridgain.com>:
> >
> >> A common use case is where you want to work on many rows of data across
> >> the grid. You’d broadcast a closure, running the same code on every node
> >> with just the local data. SQL doesn’t work in isolation — it’s often used
> >> as a filter for future computations.
> >>
> >> Regards,
> >> Stephen
> >>
> >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo100@gmail.com> wrote:
> >>>
> >>> Denis,
> >>>
> >>> I am mostly concerned about gathering use cases. It would be great to
> >>> critically assess such cases to identify why it cannot be solved by
> >>> using distributed SQL. Also it sounds similar to some kind of "hints",
> >>> but very limited and with all hints drawbacks (impossibility to use
> >>> full strength of CBO). We can provide better "hints" support with new
> >>> engine as well.
> >>>
> >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dmagda@apache.org>:
> >>>>
> >>>> Ivan,
> >>>>
> >>>> I was involved in a couple of such use cases personally, so, that's
not
> >> my
> >>>> imagination ;) Even more, as far as I remember, the primary reason why
> >> we
> >>>> improved our affinityRuns ensuring no partition is purged from a node
> >> until
> >>>> a task is completed is because many users were running local SQL from
> >>>> compute tasks and needed a guarantee that SQL will always return a
> >> correct
> >>>> result set.
> >>>>
> >>>> -
> >>>> Denis
> >>>>
> >>>>
> >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo100@gmail.com>
> >> wrote:
> >>>>
> >>>>> Denis,
> >>>>>
> >>>>> Would be nice to see real use-cases of affinity call + local SQL
> >>>>> combination. Generally, new engine will be able to infer collocation
> >>>>> resulting in the same collocated execution automatically.
> >>>>>
> >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dmagda@apache.org>:
> >>>>>>
> >>>>>> Hi Igor,
> >>>>>>
> >>>>>> Local queries feature is broadly used together with affinity-based
> >>>>> compute
> >>>>>> tasks:
> >>>>>>
> >>>>>
> >> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> >>>>>>
> >>>>>> The use case is as follows. The user knows that all required
data
> >> needed
> >>>>>> for computation is collocated, and SQL is used as an advanced
API for
> >>>>> data
> >>>>>> retrieval from the computation code. The affinity task ensures
that
> >>>>>> partitions won't be discarded from the node(s) if the topology
changes
> >>>>>> during the task execution and, thus, it's safe to run SQL locally
> >>>>> skipping
> >>>>>> distributed phases.
> >>>>>>
> >>>>>> The combination of affinity compute tasks with local SQL is
a real and
> >>>>>> valuable use case, and this is what we need to support with
Calcite.
> >> Do
> >>>>> you
> >>>>>> see any challenges?
> >>>>>>
> >>>>>> -
> >>>>>> Denis
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> >> <kondakov87@mail.ru.invalid
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi Igor!
> >>>>>>>
> >>>>>>> IMO we need to maintain the backward compatibility between
old and
> >> new
> >>>>>>> query engines as much as possible. And therefore we shouldn't
change
> >>>>> the
> >>>>>>> behavior of local queries.
> >>>>>>>
> >>>>>>> So, for local queries Calcite's planner shouldn't consider
the
> >>>>>>> distribution trait at all.
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Kind Regards
> >>>>>>> Roman Kondakov
> >>>>>>>
> >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> >>>>>>>> Hi Igniters,
> >>>>>>>>
> >>>>>>>> Working on new generation of Ignite SQL I faced a question:
«Do we
> >>>>> need
> >>>>>>> local queries at all and, if so, what semantic they should
have?».
> >>>>>>>>
> >>>>>>>> Current planing flow consists of next steps:
> >>>>>>>>
> >>>>>>>> 1) Parsing SQL to AST
> >>>>>>>> 2) Validating AST (against Schema)
> >>>>>>>> 3) Optimizing (Building execution graph)
> >>>>>>>> 4) Splitting (into query fragments which executes on
target nodes)
> >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> >>>>>>>>
> >>>>>>>> At last step we check that all Fragment sources (a table
or result)
> >>>>> have
> >>>>>>> the same distribution (in other words all sources have to
be
> >>>>> co-located)
> >>>>>>>>
> >>>>>>>> Planner and Splitter guarantee that all caches in a
Fragment are
> >>>>>>> co-located, an Exchange is produced otherwise. But if we
force local
> >>>>>>> execution we cannot produce Exchanges, that means we may
face two
> >>>>>>> non-co-located caches inside a single query fragment (result
of local
> >>>>> query
> >>>>>>> planning is a single query fragment). So, we cannot pass
the check.
> >>>>>>>>
> >>>>>>>> Should we throw an exception or omit the check for local
query
> >>>>> planning
> >>>>>>> or prohibit local queries at all?
> >>>>>>>>
> >>>>>>>> Your thoughts?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Igor
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan Pavlukhin
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Ivan Pavlukhin
> >>
> >>
> >>
>
>


-- 
Best regards,
Ivan Pavlukhin

Mime
View raw message