asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Metadata changes
Date Tue, 15 Dec 2015 19:05:55 GMT
Good point.  I wonder what the perf implications would be - probably 
minimal if these indexes aren't used during the query compilation path.

On 12/14/15 6:04 PM, Ildar Absalyamov wrote:
> I guess the main argument for 2 would be eliminating broken metadata records prior to
backwards compatibility cutoff.
> The last thing what we want to do is to be stuck with wrong implementation for compatibility
reasons. Once the functionality needed for 3 is there we can again introduce those indexes
without building sophisticated migration subsystem.
>
>> On Dec 14, 2015, at 17:55, Mike Carey <dtabass@gmail.com> wrote:
>>
>> SO - it seems like 3 is the right long-term answer, but not doable now?
>> (If it was doable now, it would obviously be the ideal choice of the three.)
>> What would be the argument for doing 2 as opposed to 1 for now?
>> As for the question of backwards compatibility, I actually didn't sense a consensus
yet.
>> I would tentatively lean towards "right" over "backwards compatible" for this change.
>> What are others thoughts on that?
>> (Soon we won't have that luxury, but right now maybe we do?)
>>
>> On 12/14/15 3:43 PM, Steven Jacobs wrote:
>>> We just had a UCR discussion on this topic. The issue is really with the
>>> third "index" here. The code now is using one "index" to go in two
>>> directions:
>>> 1) To find datatypes that use datatype A
>>> 2) To find datatypes that are used by datatype A.
>>>
>>> The way that it works now is hacked together, but designed for performance.
>>> So we have three choices here:
>>>
>>> 1) Stick to the status quo, and leave the "indexes" as they are
>>> 2) Remove the Metadata secondary indexes, which will eliminate the hack but
>>> cost some performance on Metadata
>>> 3) Implement the Metadata secondary indexes correctly as Asterix indexes.
>>> For this solution to work with our dataset designs, we will need to have
>>> the ability to index homogeneous lists. In addition, we will have reverse
>>> compatibility issues unless we plan things out for the transition.
>>>
>>> What are the thoughts?
>>>
>>>
>>> Orthogonally, it seems that the consensus for storing the datatype
>>> dataverse in the dataset Metadata is to just add it as an open field at
>>> least for now. Is that correct?
>>>
>>> Steven
>>>
>>>
>>> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>>> Thoughts inlined:
>>>>
>>>> On 12/14/15 11:12 AM, Steven Jacobs wrote:
>>>>
>>>>> Here are the conclusions that Ildar and I have drawn from looking at
the
>>>>> secondary indexes:
>>>>>
>>>>> First of all it seems that datasets are local to node groups, but
>>>>> dataverses can span node groups, which seems a little odd to me.
>>>>>
>>>> Node groups are an undocumented but to-be-exploited-someday feature that
>>>> allows datasets to be stored on less than all nodes in a given cluster. 
As
>>>> we face bigger clusters, we'll want to open up that possibility.  We will
>>>> hopefully use them inside w/o having to make users manage them manually
>>>> like parallel DB2 did/does.  Dataverses are really just a namespace thing,
>>>> not a storage thing at all, so they are orthogonal to (and unrelated to)
>>>> node groups.
>>>>
>>>>> There are three Metadata secondary indexes:  GROUPNAME_ON_DATASET_INDEX,
>>>>> DATATYPENAME_ON_DATASET_INDEX, DATATYPENAME_ON_DATATYPE_INDEX
>>>>>
>>>>> The first is used in only one case:
>>>>> When dropping a node group, check if there are any datasets using this
>>>>> node
>>>>> group. If so, don't allow the drop
>>>>> BUT, this index has a field called "dataverse" which is not used at all.
>>>>>
>>>> This one seems like a waste of space since we do this almost never. (Not
>>>> much space, but unnecessary.)  If we keep it it should become a proper
>>>> index.
>>>>
>>>>> The second is used when dropping a datatype. If there is a dataset using
>>>>> this datatype, don't allow the drop.
>>>>> Similarly, this index has a "dataverse" which is never used.
>>>>>
>>>> You're about to use the dataverse part, right?  :-)  This index seems like
>>>> it will be useful but should be a proper index.
>>>>
>>>>> The third index is used to go in two cases, using two different ideas
of
>>>>> "keys"
>>>>> It seems like this should actually be two different indexes.
>>>>>
>>>> I don't think I understood this comment....
>>>>
>>>>
>>>>> This is my understanding so far. It would be good to discuss what the
>>>>> "correct" version should be.
>>>>> Steven
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <sjaco002@ucr.edu>
wrote:
>>>>>
>>>>> Hi all,
>>>>>> I'm implementing a change so that datasets can use datatypes from
>>>>>> alternate data verses (previously the type and set had to be from
the
>>>>>> same
>>>>>> dataverse). Unfortunately this means another change for Dataset Metadata
>>>>>> (which will now store the dataverse for its type).
>>>>>>
>>>>>> As such, I had a couple of questions:
>>>>>>
>>>>>> 1) Should this change be thrown into the release branch, as it is
another
>>>>>> Metadata change?
>>>>>>
>>>>>> 2) In implementing this change, I've been looking at the Metadata
>>>>>> secondary indexes. I had a discussion with Ildar, and it seems the
thread
>>>>>> on Metadata secondary indexes being "hacked" has been lost. Is this
also
>>>>>> something that should get into the release? Is there anyone currently
>>>>>> looking at it?
>>>>>>
>>>>>> Steven
>>>>>>
>>>>>>
> Best regards,
> Ildar
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message