asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: Question about open indexes
Date Tue, 22 Sep 2015 10:43:03 GMT
Never mind,
I figured it out.

The cast in red actually changes the record in primary index into the
casted record. The cast before the insert operator into the primary index
actually casts from the input to the open type since they are compatible.

Regards,
Abdullah.


Amoudi, Abdullah.

On Tue, Sep 22, 2015 at 11:19 AM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> @Ildar,
> If that is the case, then why do we cast as well after the primary index
> insert operator. If all the records are casted already, then why is the
> second cast needed?
>
> For example, look at the following plan:
> Statement:
> insert into dataset OrdersOpen (
> for $x in dataset Orders
> return $x
> );
> Plan:
> commit
> -- COMMIT  |PARTITIONED|
>   project ([$$3])
>   -- STREAM_PROJECT  |PARTITIONED|
>     exchange
>     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>       insert into idx_Orders_Custkey on tpch:OrdersOpen from [%0->$$7]
>       -- INDEX_INSERT_DELETE  |PARTITIONED|
>         exchange
>         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>           project ([$$3, $$7])
>           -- STREAM_PROJECT  |PARTITIONED|
>             assign [$$7] <- [function-call: asterix:field-access-by-index,
> Args:[function-call: asterix:cast-record, Args:[%0->$$4], AInt32: {8}]]
>             -- ASSIGN  |PARTITIONED|
>               exchange
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 insert into tpch:OrdersOpen from %0->$$4 partitioned by
> [%0->$$3]
>                 -- INSERT_DELETE  |PARTITIONED|
>                   exchange
>                   -- HASH_PARTITION_EXCHANGE [$$3]  |PARTITIONED|
>                     assign [$$3] <- [function-call:
> asterix:field-access-by-index, Args:[%0->$$4, AInt32: {0}]]
>                     -- ASSIGN  |PARTITIONED|
>                       project ([$$4])
>                       -- STREAM_PROJECT  |PARTITIONED|
>                         assign [$$4] <- [function-call:
> asterix:cast-record, Args:[%0->$$0]]
>                         -- ASSIGN  |PARTITIONED|
>                           project ([$$0])
>                           -- STREAM_PROJECT  |PARTITIONED|
>                             exchange
>                             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                               data-scan []<-[$$5, $$0] <- tpch:Orders
>                               -- DATASOURCE_SCAN  |PARTITIONED|
>                                 exchange
>                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                   empty-tuple-source
>                                   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>
> what is the point of the cast in red?
>
>
> Amoudi, Abdullah.
>
> On Tue, Sep 22, 2015 at 10:18 AM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
>
>> I see.
>> Thanks Ildar,
>>
>> Abdullah.
>>
>> Amoudi, Abdullah.
>>
>> On Tue, Sep 22, 2015 at 10:04 AM, Ildar Absalyamov <
>> ildar.absalyamov@gmail.com> wrote:
>>
>>> Abdullah,
>>>
>>> If I remember correctly whenever a secondary open index is created all
>>> existing records would be casted to a proper type to ensure that the index
>>> creation is valid.
>>> As for the overall correctness of casting operation, semantically
>>> creating an open index is the same thing as altering the dataset type. The
>>> current implementation allows only one open index of particular type
>>> created on a single field. If we would have had “alter datatype”
>>> functionality the open indexing would not be required at all.
>>>
>>> > On Sep 21, 2015, at 23:25, abdullah alamoudi <amoudi@apache.org>
>>> wrote:
>>> >
>>> > More thoughts:
>>> > I assume the intention of the cast was just to make sure if the open
>>> field
>>> > exists, it is of the specified type. Moreover, the un-casted record
>>> should
>>> > be inserted into the index.
>>> > If my assumptions are not correct, please, let me know ASAP.
>>> >
>>> > I have two thoughts on this:
>>> > 1. Actually, insert plans show that the records being inserted into the
>>> > primary index is actually the casted record creating the issue
>>> described
>>> > above.
>>> >
>>> > 2. I don't believe this is the right way to ensure that the open field
>>> if
>>> > exists is of the right type. why not extract the field using field
>>> access
>>> > by name function and then verify the type using the field tag?
>>> >
>>> >
>>> >
>>> > On Tue, Sep 22, 2015 at 9:11 AM, abdullah alamoudi <amoudi@apache.org>
>>> > wrote:
>>> >
>>> >> Hi Dev, @Ildar,
>>> >>
>>> >> In the insert pipeline for datasets with open indexes, we introduce
a
>>> cast
>>> >> function before the insert and so one would expect the records to
>>> look like
>>> >> the casted record type which I assume has {{the closed fields + a
>>> nullable
>>> >> field}}.
>>> >>
>>> >> The question is, what happens to the previously existing records?,
>>> since
>>> >> now the index has both, records of the original type and records of
>>> the
>>> >> casted type.
>>> >>
>>> >> Thanks,
>>> >> Abdullah.
>>> >>
>>>
>>> Best regards,
>>> Ildar
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message