ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: Distributive SQL Joins
Date Sat, 08 Aug 2015 00:46:30 GMT
Sergi,

I personally don't like that for certain types of queries we will be
throwing an exception.

After analyzing the approaches you suggested, I can think of cases where A
performs better than B, as well as when B performs better than A.

However, if you prefer B, I don't mind us taking that approach. As you have
mentioned yourself, in case of non-collocated non-affinity-ID queries, you
would require a broadcast which is a performance hit. I still vote that we
take this performance hit and do the broadcast (optimized with batching, of
course), and execute the query instead of throwing an exception.

D.

On Fri, Aug 7, 2015 at 3:40 AM, Sergi Vladykin <sergi.vladykin@gmail.com>
wrote:

> Alexey,
>
> 1. Yes, in my plan it should work exactly like that: if both keys in join
> are affinity keys, then we are fully collocated, if only one then we can
> run join remotely as described, if none of them we will fail to run the
> query.
>
> 2. I mean we don't have values for these affinity keys in our local query
> result to map requests to remote nodes.
> Example:
> Lest say we have 4 partitioned tables:
> - Organization(id) with affinity key `id`.
> - Person(id, orgId, name) with affinity key `orgId` (it means that it will
> be collocated with `Organization`)
> - Manufacturer(id) with affinity key `id`.
> - Purchase(id, personId, manufId) with affinity key `manufId` (collocated
> with `Manufacturer`)
>
> As you can see `Purchase` has a reference to a `Person` and we may want to
> join them by this reference in a query like this:
>
> SELECT pe.name FROM Person pe JOIN Purchase pu ON pe.id = pu.personId
> WHERE
> pu.id = ?
>
> as you can see neither `pe.id` nor `pu.personId` is an affinity key here.
> But if the `Person` has affinity key `id` and thus is not collocated with
> `Organization`
> we can run query on `Purchase`, take value of `personId` and find the
> affinity node to get the needed `Person`.
>
> Of course it is a restriction but there are multiple ways to workaround it,
> so I don't think it is really a problem:
>
> 1. Use primary key as affinity key if table is used in such joins. This way
> `Person` still can be joined to `Organization`
> (less effective though) and `Pusrchase` can be joined to `Person` as well.
> 2. Use denormalization: instread of having `Purchase.personId` store
> `Person` object itself there.
> 3. Introduce another entity which can duplicate data from `Person` but have
> collocation needed for this failing query:
> For our example it can be an entity PersonForPurchace(id, manufId, name)
> with the same affinity key `manufId` as `Purchase`.
> This way our query can be rewritten in fully collocated style:
>
> SELECT pe.name FROM PersonForPurchace pe JOIN Purchase pu ON pe.id =
> pu.personId AND pu.manufId = pe.manufId WHERE pu.id = ?
>
> Sergi
>
>
>
>
>
> 2015-08-07 11:42 GMT+03:00 Alexey Kuznetsov <akuznetsov@gridgain.com>:
>
> > Sergi,
> >
> > Questions about plan "B" :)
> > 1) It is possible to throw exception on query prepare state (fail fast)
> > when
> > we don't know remote affinity key?
> > 2) Could you provide an example when we don't know remote affinity key? I
> > think we always have some default affinity (no?)?
> >
> > --
> > Alexey Kuznetsov
> > GridGain Systems
> > www.gridgain.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message