asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <>
Subject Re: Metadata changes
Date Tue, 15 Dec 2015 23:39:41 GMT
Agreed.  If you eliminate them will the metadata code work but w/scans 
(Or is there recoding to do in order to get a post-drop world working 

On 12/15/15 2:40 PM, Steven Jacobs wrote:
> Some new light to add to this discussion:
> These metadata secondary "indexes" currently boil down to one single
> use-case with three lookups:
> When deleting a datatype:
> 1) confirm that it isn't used by any dataset
> 2) confirm that it isn't used by any other datatype
> If both are true than delete this type and
> 3) find and delete its subtypes
> I discovered today that there isn't a single test in the testsuite that
> actually covers any of the three events.
> Even worse, if you actually try to hit one of these checks, the code breaks
> on master as follows:
> A) Drop a datatype used by a dataset = throw confusing exception to the user
> B) Drop a datatype used by another datatype = complete successfully, break
> the metadata for future queries
> These have been broken for at least two years without anyone coming across
> them.
> It is possible that the "indexes" could be helpful in querying the Metadata
> in general (although they are being ignored now),
> but my question is whether there is a large return on investment or not,
> since:
> I) Metadata is typically small
> II) Metadata queries are atypical
> Steven
> On Tue, Dec 15, 2015 at 11:05 AM, Mike Carey <> wrote:
>> Good point.  I wonder what the perf implications would be - probably
>> minimal if these indexes aren't used during the query compilation path.
>> On 12/14/15 6:04 PM, Ildar Absalyamov wrote:
>>> I guess the main argument for 2 would be eliminating broken metadata
>>> records prior to backwards compatibility cutoff.
>>> The last thing what we want to do is to be stuck with wrong
>>> implementation for compatibility reasons. Once the functionality needed for
>>> 3 is there we can again introduce those indexes without building
>>> sophisticated migration subsystem.
>>> On Dec 14, 2015, at 17:55, Mike Carey <> wrote:
>>>> SO - it seems like 3 is the right long-term answer, but not doable now?
>>>> (If it was doable now, it would obviously be the ideal choice of the
>>>> three.)
>>>> What would be the argument for doing 2 as opposed to 1 for now?
>>>> As for the question of backwards compatibility, I actually didn't sense
>>>> a consensus yet.
>>>> I would tentatively lean towards "right" over "backwards compatible" for
>>>> this change.
>>>> What are others thoughts on that?
>>>> (Soon we won't have that luxury, but right now maybe we do?)
>>>> On 12/14/15 3:43 PM, Steven Jacobs wrote:
>>>>> We just had a UCR discussion on this topic. The issue is really with
>>>>> third "index" here. The code now is using one "index" to go in two
>>>>> directions:
>>>>> 1) To find datatypes that use datatype A
>>>>> 2) To find datatypes that are used by datatype A.
>>>>> The way that it works now is hacked together, but designed for
>>>>> performance.
>>>>> So we have three choices here:
>>>>> 1) Stick to the status quo, and leave the "indexes" as they are
>>>>> 2) Remove the Metadata secondary indexes, which will eliminate the hack
>>>>> but
>>>>> cost some performance on Metadata
>>>>> 3) Implement the Metadata secondary indexes correctly as Asterix
>>>>> indexes.
>>>>> For this solution to work with our dataset designs, we will need to have
>>>>> the ability to index homogeneous lists. In addition, we will have
>>>>> reverse
>>>>> compatibility issues unless we plan things out for the transition.
>>>>> What are the thoughts?
>>>>> Orthogonally, it seems that the consensus for storing the datatype
>>>>> dataverse in the dataset Metadata is to just add it as an open field
>>>>> least for now. Is that correct?
>>>>> Steven
>>>>> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <>
>>>>> Thoughts inlined:
>>>>>> On 12/14/15 11:12 AM, Steven Jacobs wrote:
>>>>>> Here are the conclusions that Ildar and I have drawn from looking
>>>>>>> the
>>>>>>> secondary indexes:
>>>>>>> First of all it seems that datasets are local to node groups,
>>>>>>> dataverses can span node groups, which seems a little odd to
>>>>>>> Node groups are an undocumented but to-be-exploited-someday feature
>>>>>> that
>>>>>> allows datasets to be stored on less than all nodes in a given
>>>>>> cluster.  As
>>>>>> we face bigger clusters, we'll want to open up that possibility.
>>>>>> will
>>>>>> hopefully use them inside w/o having to make users manage them manually
>>>>>> like parallel DB2 did/does.  Dataverses are really just a namespace
>>>>>> thing,
>>>>>> not a storage thing at all, so they are orthogonal to (and unrelated
>>>>>> to)
>>>>>> node groups.
>>>>>> There are three Metadata secondary indexes:
>>>>>>> The first is used in only one case:
>>>>>>> When dropping a node group, check if there are any datasets using
>>>>>>> node
>>>>>>> group. If so, don't allow the drop
>>>>>>> BUT, this index has a field called "dataverse" which is not used
>>>>>>> all.
>>>>>>> This one seems like a waste of space since we do this almost
>>>>>> (Not
>>>>>> much space, but unnecessary.)  If we keep it it should become a proper
>>>>>> index.
>>>>>> The second is used when dropping a datatype. If there is a dataset
>>>>>>> using
>>>>>>> this datatype, don't allow the drop.
>>>>>>> Similarly, this index has a "dataverse" which is never used.
>>>>>>> You're about to use the dataverse part, right?  :-)  This index
>>>>>> like
>>>>>> it will be useful but should be a proper index.
>>>>>> The third index is used to go in two cases, using two different ideas
>>>>>>> of
>>>>>>> "keys"
>>>>>>> It seems like this should actually be two different indexes.
>>>>>>> I don't think I understood this comment....
>>>>>> This is my understanding so far. It would be good to discuss what
>>>>>>> "correct" version should be.
>>>>>>> Steven
>>>>>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <>
>>>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>> I'm implementing a change so that datasets can use datatypes
>>>>>>>> alternate data verses (previously the type and set had to
be from the
>>>>>>>> same
>>>>>>>> dataverse). Unfortunately this means another change for Dataset
>>>>>>>> Metadata
>>>>>>>> (which will now store the dataverse for its type).
>>>>>>>> As such, I had a couple of questions:
>>>>>>>> 1) Should this change be thrown into the release branch,
as it is
>>>>>>>> another
>>>>>>>> Metadata change?
>>>>>>>> 2) In implementing this change, I've been looking at the
>>>>>>>> secondary indexes. I had a discussion with Ildar, and it
seems the
>>>>>>>> thread
>>>>>>>> on Metadata secondary indexes being "hacked" has been lost.
Is this
>>>>>>>> also
>>>>>>>> something that should get into the release? Is there anyone
>>>>>>>> looking at it?
>>>>>>>> Steven
>>>>>>>> Best regards,
>>> Ildar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message