asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Question about open indexes
Date Mon, 28 Sep 2015 20:28:34 GMT
Sounds good. My main intent was to find out if I had missed it :)

Cheers,
Till

On 28 Sep 2015, at 13:14, Ildar Absalyamov wrote:

> Not yet. Was planning to have one in few days though.
>
>> On Sep 28, 2015, at 13:00, Till Westmann <tillw@apache.org> wrote:
>>
>> Ok, makes sense. We also don’t have a fix for it, right?
>>
>> Cheers,
>> Till
>>
>> On 28 Sep 2015, at 12:15, Ian Maxon wrote:
>>
>>> I think the general thought that is unfortunately the fix for this
>>> can't go into 0.8.7 as it is right now, because it needs changes to
>>> Hyracks first. We should probably do a bugfix release right after.
>>>
>>> -Ian
>>>
>>> On Mon, Sep 28, 2015 at 11:51 AM, Till Westmann <tillw@apache.org> 
>>> wrote:
>>>> I’ve added the “0.8.7-blocker” to
>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1109 (which I 
>>>> believe covers
>>>> this issue).
>>>> Is this what we agree on right now?
>>>>
>>>> Also, do we already have a review for this?
>>>>
>>>> Thanks,
>>>> Till
>>>>
>>>>
>>>> On 26 Sep 2015, at 1:37, abdullah alamoudi wrote:
>>>>
>>>>> I agree with Chen especially with the system not yet production 
>>>>> ready.
>>>>> It seems that going through with the release is more important.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> Amoudi, Abdullah.
>>>>>
>>>>> On Sat, Sep 26, 2015 at 3:33 AM, Chen Li <chenli@gmail.com> wrote:
>>>>>
>>>>>> I vote for including this fix in the next Asterxi/Hyracks 
>>>>>> release, not
>>>>>> this
>>>>>> one.
>>>>>>
>>>>>> Chen
>>>>>>
>>>>>> On Fri, Sep 25, 2015 at 4:23 PM, Ildar Absalyamov <
>>>>>> ildar.absalyamov@gmail.com> wrote:
>>>>>>
>>>>>>> It did not really occur to me during today during the meeting,

>>>>>>> but
>>>>>>
>>>>>> Preston
>>>>>>>
>>>>>>> pointed out that the secondary index delete fix, that I 
>>>>>>> proposed, spans
>>>>>>> both Hyracks & Asterix codebase. Thus we will either have
to 
>>>>>>> release
>>>>>>> Hyracks once again, or bite the bullet, sign the RC without this

>>>>>>> fixing
>>>>>>> this issue and create bug-fix releases for both Hyracks&Asterix

>>>>>>> right
>>>>>>
>>>>>> after.
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 22, 2015, at 22:27, Mike Carey <dtabass@gmail.com>

>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Ah - that makes sense now.  Thx.  (And welcome back. :-))
>>>>>>>>
>>>>>>>> On 9/22/15 10:02 PM, Ildar Absalyamov wrote:
>>>>>>>>>
>>>>>>>>> Sorry for confusion, my initial answer was not correct
enough,
>>>>>>
>>>>>> probably
>>>>>>>
>>>>>>> should have waited sometime after I drove 1500 miles form 
>>>>>>> Seattle :)
>>>>>>>>>
>>>>>>>>> The casting in the insert pipeline, which Abdullah mentioned,

>>>>>>>>> is
>>>>>>
>>>>>> needed
>>>>>>>
>>>>>>> only for secondary index insert. The reasoning behind this 
>>>>>>> casting is to
>>>>>>> ensure that the record is equivalent, thus it is safe to create

>>>>>>> an open
>>>>>>> index. It is true that we can get <Pk, Sk> pairs out of
original 
>>>>>>> record
>>>>>>> using get-field-by-name\index, but the cast operator is 
>>>>>>> introduced
>>>>>>> merely
>>>>>>> to kill the pipeline if the dataset input is not correct.
>>>>>>>>>
>>>>>>>>> Thus the records in primary are never touched of modified,
not 
>>>>>>>>> matter
>>>>>>>
>>>>>>> what indexes were created.
>>>>>>>>>
>>>>>>>>> I am not sure however what is the second cast in Abdullah’s

>>>>>>>>> plan, and
>>>>>>>
>>>>>>> where is comes from.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @Taewoo, so scan-delete-btree-secondary-index-open test
does 
>>>>>>>>> not
>>>>>>>
>>>>>>> actually delete data from the secondary index? I have checked

>>>>>>> the plan
>>>>>>
>>>>>> and
>>>>>>>
>>>>>>> it has the delete operator. Maybe it is initialized with wrong
>>>>>>
>>>>>> parameters,
>>>>>>>
>>>>>>> I’ll have a close look.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Sep 22, 2015, at 18:33, Mike Carey <dtabass@gmail.com>

>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Sounds kinda bad!  Also, I wonder what happens when
the 
>>>>>>>>>> compiler
>>>>>>>
>>>>>>> encounters records in the dataset - whose type in the catalog

>>>>>>> doesn't
>>>>>>
>>>>>> claim
>>>>>>>
>>>>>>> to have a given (but now indexed) open field - e.g., during a

>>>>>>> data scan
>>>>>>
>>>>>> or
>>>>>>>
>>>>>>> an access via some other path?  Can Bad Things Happen due to
the
>>>>>>> compiler
>>>>>>> not properly anticipating the casted form of the records?  
>>>>>>> (Maybe I am
>>>>>>> misunderstanding something, but we should probably take a 
>>>>>>> careful look
>>>>>>> at
>>>>>>> the test cases - and make sure we do things like add a bunch
of 
>>>>>>> records,
>>>>>>> then add such an index, then add some more records, then 
>>>>>>> stress-test
>>>>>>> type-related things that come at the dataset (i) thru the index,

>>>>>>> (ii)
>>>>>>
>>>>>> thru
>>>>>>>
>>>>>>> a primary dataset scan, and (iii) thru some other index.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/22/15 4:06 PM, Taewoo Kim wrote:
>>>>>>>>>>>
>>>>>>>>>>> I think this issue:
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1109 is
>>>>>>>>>>>
>>>>>>>>>>> related. Currently, index entries (SK, PK) are
not deleted 
>>>>>>>>>>> on an
>>>>>>>
>>>>>>> open-type
>>>>>>>>>>>
>>>>>>>>>>> secondary index during a deletion. This issue
was not 
>>>>>>>>>>> surfaced due
>>>>>>
>>>>>> to
>>>>>>>
>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> fact that every search after a secondary index
search had to 
>>>>>>>>>>> go
>>>>>>>
>>>>>>> through the
>>>>>>>>>>>
>>>>>>>>>>> primary index lookup.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Taewoo
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 22, 2015 at 12:04 AM, Ildar Absalyamov
<
>>>>>>>>>>> ildar.absalyamov@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Abdullah,
>>>>>>>>>>>>
>>>>>>>>>>>> If I remember correctly whenever a secondary
open index is 
>>>>>>>>>>>> created
>>>>>>>
>>>>>>> all
>>>>>>>>>>>>
>>>>>>>>>>>> existing records would be casted to a proper
type to ensure 
>>>>>>>>>>>> that
>>>>>>
>>>>>> the
>>>>>>>
>>>>>>> index
>>>>>>>>>>>>
>>>>>>>>>>>> creation is valid.
>>>>>>>>>>>> As for the overall correctness of casting
operation, 
>>>>>>>>>>>> semantically
>>>>>>>
>>>>>>> creating
>>>>>>>>>>>>
>>>>>>>>>>>> an open index is the same thing as altering
the dataset 
>>>>>>>>>>>> type. The
>>>>>>>
>>>>>>> current
>>>>>>>>>>>>
>>>>>>>>>>>> implementation allows only one open index
of particular 
>>>>>>>>>>>> type
>>>>>>
>>>>>> created
>>>>>>>
>>>>>>> on a
>>>>>>>>>>>>
>>>>>>>>>>>> single field. If we would have had “alter
datatype” 
>>>>>>>>>>>> functionality
>>>>>>>
>>>>>>> the open
>>>>>>>>>>>>
>>>>>>>>>>>> indexing would not be required at all.
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sep 21, 2015, at 23:25, abdullah alamoudi

>>>>>>>>>>>>> <amoudi@apache.org>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> More thoughts:
>>>>>>>>>>>>> I assume the intention of the cast was
just to make sure 
>>>>>>>>>>>>> if the
>>>>>>
>>>>>> open
>>>>>>>>>>>>
>>>>>>>>>>>> field
>>>>>>>>>>>>>
>>>>>>>>>>>>> exists, it is of the specified type.
Moreover, the 
>>>>>>>>>>>>> un-casted
>>>>>>
>>>>>> record
>>>>>>>>>>>>
>>>>>>>>>>>> should
>>>>>>>>>>>>>
>>>>>>>>>>>>> be inserted into the index.
>>>>>>>>>>>>> If my assumptions are not correct, please,
let me know 
>>>>>>>>>>>>> ASAP.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have two thoughts on this:
>>>>>>>>>>>>> 1. Actually, insert plans show that the
records being 
>>>>>>>>>>>>> inserted
>>>>>>
>>>>>> into
>>>>>>>
>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>> primary index is actually the casted
record creating the 
>>>>>>>>>>>>> issue
>>>>>>>
>>>>>>> described
>>>>>>>>>>>>>
>>>>>>>>>>>>> above.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. I don't believe this is the right
way to ensure that 
>>>>>>>>>>>>> the open
>>>>>>>
>>>>>>> field if
>>>>>>>>>>>>>
>>>>>>>>>>>>> exists is of the right type. why not
extract the field 
>>>>>>>>>>>>> using field
>>>>>>>
>>>>>>> access
>>>>>>>>>>>>>
>>>>>>>>>>>>> by name function and then verify the
type using the field 
>>>>>>>>>>>>> tag?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 22, 2015 at 9:11 AM, abdullah
alamoudi <
>>>>>>>
>>>>>>> amoudi@apache.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Dev, @Ildar,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the insert pipeline for datasets
with open indexes, we
>>>>>>>
>>>>>>> introduce a
>>>>>>>>>>>>
>>>>>>>>>>>> cast
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> function before the insert and so
one would expect the 
>>>>>>>>>>>>>> records to
>>>>>>>
>>>>>>> look
>>>>>>>>>>>>
>>>>>>>>>>>> like
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the casted record type which I assume
has {{the closed 
>>>>>>>>>>>>>> fields + a
>>>>>>>>>>>>
>>>>>>>>>>>> nullable
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> field}}.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The question is, what happens to
the previously existing
>>>>>>
>>>>>> records?,
>>>>>>>
>>>>>>> since
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> now the index has both, records of
the original type and 
>>>>>>>>>>>>>> records
>>>>>>>
>>>>>>> of the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> casted type.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ildar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Ildar
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Ildar
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>
> Best regards,
> Ildar

Mime
View raw message