ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Darlington <stephen.darling...@gridgain.com>
Subject Re: Calcite based SQL query engine. Local queries
Date Thu, 07 Nov 2019 10:04:29 GMT
I made a (bad) assumption that this would also affect queries against partitions. If “setLocal()”
goes away but “setPartitions()” remains I’m happy.

What I would say is that the “broadcast / local” method is one I see fairly often. Do
we need to do a better job educating people of the “correct” way?

Regards,
Stephen

> On 7 Nov 2019, at 08:30, Alexey Goncharuk <alexey.goncharuk@gmail.com> wrote:
> 
> Denis, Stephen,
> 
> Running a local query in a broadcast closure won't work on changing
> topology. We specifically added an affinityCall method to the compute API
> in order to pin a partition to prevent its moving and eviction throughout
> the task execution. Therefore, the query inside an affinityCall is always
> executed against some partitions (otherwise the query may give incorrect
> results when topology is changed).
> 
> I support Igor's question and think that the 'local' flag for the query
> should be deprecated and eventually removed. A 'local' query can always be
> expressed as a query agains a set of partitions. If those partitions are
> located on the same node - good, we get fast and correct results. If not -
> we may either raise an exception and ask user to remap the query, or
> fallback to a distributed query execution.
> 
> Given that the Calcite prototype is in its early stages, it's likely its
> first version will be available in 3.x, and it's a good chance to get rid
> of wrong API pieces.
> 
> --AG
> 
> пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> stephen.darlington@gridgain.com>:
> 
>> A common use case is where you want to work on many rows of data across
>> the grid. You’d broadcast a closure, running the same code on every node
>> with just the local data. SQL doesn’t work in isolation — it’s often used
>> as a filter for future computations.
>> 
>> Regards,
>> Stephen
>> 
>>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo100@gmail.com> wrote:
>>> 
>>> Denis,
>>> 
>>> I am mostly concerned about gathering use cases. It would be great to
>>> critically assess such cases to identify why it cannot be solved by
>>> using distributed SQL. Also it sounds similar to some kind of "hints",
>>> but very limited and with all hints drawbacks (impossibility to use
>>> full strength of CBO). We can provide better "hints" support with new
>>> engine as well.
>>> 
>>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dmagda@apache.org>:
>>>> 
>>>> Ivan,
>>>> 
>>>> I was involved in a couple of such use cases personally, so, that's not
>> my
>>>> imagination ;) Even more, as far as I remember, the primary reason why
>> we
>>>> improved our affinityRuns ensuring no partition is purged from a node
>> until
>>>> a task is completed is because many users were running local SQL from
>>>> compute tasks and needed a guarantee that SQL will always return a
>> correct
>>>> result set.
>>>> 
>>>> -
>>>> Denis
>>>> 
>>>> 
>>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo100@gmail.com>
>> wrote:
>>>> 
>>>>> Denis,
>>>>> 
>>>>> Would be nice to see real use-cases of affinity call + local SQL
>>>>> combination. Generally, new engine will be able to infer collocation
>>>>> resulting in the same collocated execution automatically.
>>>>> 
>>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dmagda@apache.org>:
>>>>>> 
>>>>>> Hi Igor,
>>>>>> 
>>>>>> Local queries feature is broadly used together with affinity-based
>>>>> compute
>>>>>> tasks:
>>>>>> 
>>>>> 
>> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
>>>>>> 
>>>>>> The use case is as follows. The user knows that all required data
>> needed
>>>>>> for computation is collocated, and SQL is used as an advanced API
for
>>>>> data
>>>>>> retrieval from the computation code. The affinity task ensures that
>>>>>> partitions won't be discarded from the node(s) if the topology changes
>>>>>> during the task execution and, thus, it's safe to run SQL locally
>>>>> skipping
>>>>>> distributed phases.
>>>>>> 
>>>>>> The combination of affinity compute tasks with local SQL is a real
and
>>>>>> valuable use case, and this is what we need to support with Calcite.
>> Do
>>>>> you
>>>>>> see any challenges?
>>>>>> 
>>>>>> -
>>>>>> Denis
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
>> <kondakov87@mail.ru.invalid
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Igor!
>>>>>>> 
>>>>>>> IMO we need to maintain the backward compatibility between old
and
>> new
>>>>>>> query engines as much as possible. And therefore we shouldn't
change
>>>>> the
>>>>>>> behavior of local queries.
>>>>>>> 
>>>>>>> So, for local queries Calcite's planner shouldn't consider the
>>>>>>> distribution trait at all.
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Kind Regards
>>>>>>> Roman Kondakov
>>>>>>> 
>>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
>>>>>>>> Hi Igniters,
>>>>>>>> 
>>>>>>>> Working on new generation of Ignite SQL I faced a question:
«Do we
>>>>> need
>>>>>>> local queries at all and, if so, what semantic they should have?».
>>>>>>>> 
>>>>>>>> Current planing flow consists of next steps:
>>>>>>>> 
>>>>>>>> 1) Parsing SQL to AST
>>>>>>>> 2) Validating AST (against Schema)
>>>>>>>> 3) Optimizing (Building execution graph)
>>>>>>>> 4) Splitting (into query fragments which executes on target
nodes)
>>>>>>>> 5) Mapping (query fragments to nodes/partitions)
>>>>>>>> 
>>>>>>>> At last step we check that all Fragment sources (a table
or result)
>>>>> have
>>>>>>> the same distribution (in other words all sources have to be
>>>>> co-located)
>>>>>>>> 
>>>>>>>> Planner and Splitter guarantee that all caches in a Fragment
are
>>>>>>> co-located, an Exchange is produced otherwise. But if we force
local
>>>>>>> execution we cannot produce Exchanges, that means we may face
two
>>>>>>> non-co-located caches inside a single query fragment (result
of local
>>>>> query
>>>>>>> planning is a single query fragment). So, we cannot pass the
check.
>>>>>>>> 
>>>>>>>> Should we throw an exception or omit the check for local
query
>>>>> planning
>>>>>>> or prohibit local queries at all?
>>>>>>>> 
>>>>>>>> Your thoughts?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Igor
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Ivan Pavlukhin
>>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Ivan Pavlukhin
>> 
>> 
>> 



Mime
View raw message