asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Metadata names generation
Date Thu, 25 Jun 2015 01:43:21 GMT
OK, so when one is dealing with such names putting them in quotes will resolve ambiguity.
However auto generated type names are just strings, and they should be unique.
Consider the example:
FooType as open {
   “a”: string,
   “b”: { “c” : { “d”: string }},
   “b.c”: { “d”: string }
}

The new naming scheme will generate these types: FooType.b, FooType.b.c and identical FooType.b.c!
Whereas the old naming will produce Field_b_in_FooType, Field_c_in_Field_b_in_FooType and
Field_b.c_in_FooType, thus resolving the name conflict.

So it seems type name verbosity was there for a reason?

> On Jun 24, 2015, at 18:27, Till Westmann <tillw@apache.org> wrote:
> 
> 
>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>> 
>> a clear case is where there is a data type with a field named "a.b" and
>> another field named "a" which has a nested field named "b".
>> 
>> This is allowed right now. You would have to access the first as "a.b" and
>> the second as a.b. The quotes basically tell the parser "this is a single
>> name with whatever characters I want in it.”
> 
> a.b is mainly a convenient shortcut for “a”.”b"
> 
>> To me it seems fine to
>> disallow some characters, but back when I had discussions about this with
>> Vinayak, Mike, and Till, Till was arguing against disallowing characters. I
>> can't really remember his reasons now though.
>> 
>> @Till, what are your thoughts on this?
> 
> All characters are allowed for field names in JSON (http://json.org <http://json.org/>).
> So if disallow some characters, we will need to map names that contain them so something
else (or not allow such JSON documents).
> It seems that that will get messy and/or painful pretty quickly.
> 
> Cheers,
> Till
> 
>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <bamousaa@gmail.com>
>> wrote:
>> 
>>> If that's the case, then I think we need to disallow using the "." since it
>>> is used to access nested fields and can definitely cause ambiguity.
>>> 
>>> a clear case is where there is a data type with a field named "a.b" and
>>> another field named "a" which has a nested field named "b".
>>> 
>>> Thoughts?
>>> 
>>> 
>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>> 
>>>> I think there is no completely user-friendly way around this. Basically
>>> our
>>>> names allow ALL characters if they are incapsulated in quotes, so there
>>>> isn't a character we can use that doesn't have the potential for
>>> ambiguity
>>>> from the user's perspective. This is why I had to change the nested stuff
>>>> in indexing to be a list of strings rather than a single string.
>>>> Steven
>>>> 
>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com> wrote:
>>>> 
>>>>> In this case, there could be ambiguity in the names.  Does it matter?
>>>>> 
>>>>> Chen
>>>>> 
>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>> wrote:
>>>>> 
>>>>>> Fieldnames do allow these characters (both of them).
>>>>>> Steven
>>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>> 
>>>>>>> I also prefer "." than "_".  Also want to confirm that field
names
>>>>> don't
>>>>>>> allow these two characters.
>>>>>>> 
>>>>>>> Chen
>>>>>>> 
>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I second Young-Seek (especially since this is the syntax
that
>>> users
>>>>>> will
>>>>>>>> use themselves for nested information in queries).
>>>>>>>> 
>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok Kim <
>>>> kisskys@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> It seems better to use "." instead of "_" since "." is
more
>>>>> intuitive
>>>>>>> (at
>>>>>>>>> least to me) than "_".
>>>>>>>>> For example, the FacebookUserType_address will be
>>>>>>>> FacebookUserType.address.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Young-Seok
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey <dtabass@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Much cleaner!  Others should weigh in here to help
finalize
>>> the
>>>>>>>>>> conventions....  Thoughts?
>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov" <
>>>> iabsa001@cs.ucr.edu
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> So the general solution is that the generated
names should
>>>>> become
>>>>>>>> less
>>>>>>>>>>> verbose (consider previous examples):
>>>>>>>>>>> 1) Anonymous fields naming scheme will change
to
>>>> outerTypeName
>>>>> +
>>>>>>> “_”
>>>>>>>> +
>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>>>> changed
>>>>> to
>>>>>>>>>>> “FacebookUserType_address”
>>>>>>>>>>> 2) Anonymous collection item naming scheme stays
the same,
>>>> i.e.
>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>>> to
>>>>>>>>>>> “FacebookUserType_employment_ItemType” (name
is changed
>>>> because
>>>>>> the
>>>>>>>>>>> anonymous field employment naming was changed
as described
>>>>>> earlier)
>>>>>>>>>>> 3) Union type completely seizes to exist in metadata
(it
>>>> stays
>>>>> in
>>>>>>> the
>>>>>>>>>>> object model though), i.e.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>>>>>>>>>>> is changed to
>>>> “FacebookUserType_employment_ItemType_end-date”,
>>>>>>> where
>>>>>>>>> the
>>>>>>>>>>> type metadata will have an additional field “Optional”
with
>>>>> value
>>>>>>>>> “true”.
>>>>>>>>>>> 
>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar Absalyamov
<
>>>>>> iabsa001@cs.ucr.edu
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> So I have done half of the fix, which is
moved name
>>>>> generation
>>>>>>>> logic
>>>>>>>>>> out
>>>>>>>>>>> of the Metadata node to the client.
>>>>>>>>>>>> Up to that point nothing in Metadata format
was changed,
>>>>> which
>>>>>>>> makes
>>>>>>>>> me
>>>>>>>>>>> wonder whether I should proceed with the following
changes.
>>>>>>>>>>>> 
>>>>>>>>>>>> As it could be seen from the previous email
getting rid
>>> of
>>>>>>>>>>> union-inferred name generation would make auto
generated
>>> type
>>>>>> names
>>>>>>>>> less
>>>>>>>>>>> scary, but not entirely.
>>>>>>>>>>>> Having in mind what Mike mentioned earlier
today, should
>>> we
>>>>> do
>>>>>>>>>> something
>>>>>>>>>>> about other auto generated type name cases?
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 19, 2015, at 13:01, Ildar Absalyamov
<
>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently we are generating the names
for
>>> inner\anonymous
>>>>>> types
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>> following cases:
>>>>>>>>>>>>> 1) Anonymous field in the record.
>>>>>>>>>>>>> AQL Example:
>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>       id: int32,
>>>>>>>>>>>>>       name: string,
>>>>>>>>>>>>>       address: {
>>>>>>>>>>>>>            address_line: string,
>>>>>>>>>>>>>            city: string
>>>>>>>>>>>>>            state: string
>>>>>>>>>>>>>    }
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> The pattern for generating an anonymous
field name is
>>>>>> "Field_" +
>>>>>>>>>>> fieldName + "_in_" + outerTypeName, which translates
to
>>>>>>>>>>> "Field_address_in_FacebookUserType" in the given
example
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2) Anonymous collection (ordered\unordered
list) item
>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>       id: int32,
>>>>>>>>>>>>>       name: string,
>>>>>>>>>>>>>       employment: [{
>>>>>>>>>>>>>            organization-name: string,
>>>>>>>>>>>>>            start-date: date
>>>>>>>>>>>>>            end-date: date?
>>>>>>>>>>>>>    }]
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> The pattern for generating an anonymous
collection item
>>>> name
>>>>>> is
>>>>>>>>>>> collectionFieldName+_ItemType", which translates
to
>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>>> given
>>>>>>> example
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 3) Nullable fields
>>>>>>>>>>>>> Same example as above could be used (end-date
field):
>>> the
>>>>>>> pattern
>>>>>>>>> for
>>>>>>>>>>> generating a nullable field name is "Type_#"
+
>>>>>>>> fieldsNumberInUnoinList
>>>>>>>>> +
>>>>>>>>>>> "_UnionType_" + outerTypeName, which translates
to
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>>>>>>>>>>> in the given example.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So you can see these auto generated names
could stack up
>>>>>> pretty
>>>>>>>> fast
>>>>>>>>>>> and be completely incomprehensible. Just to give
you a
>>> small
>>>>>> flavor
>>>>>>>> of
>>>>>>>>>>> that, here is one of the metadata datasets type
>>> definitions:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> open {
>>>>>>>>>>>>> DataverseName: STRING,
>>>>>>>>>>>>> DatatypeName: STRING,
>>>>>>>>>>>>> Derived: UNION(NULL, open {
>>>>>>>>>>>>>     Tag: STRING,
>>>>>>>>>>>>>     IsAnonymous: BOOLEAN,
>>>>>>>>>>>>>     EnumValues: UNION(NULL, [ STRING
]),
>>>>>>>>>>>>>     Record: UNION(NULL, open {
>>>>>>>>>>>>>         IsOpen: BOOLEAN,
>>>>>>>>>>>>>         Fields: [ open {
>>>>>>>>>>>>>             FieldName: STRING,
>>>>>>>>>>>>>             FieldType: STRING
>>>>>>>>>>>>>           }
>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>     ),
>>>>>>>>>>>>>     Union: UNION(NULL, [ STRING ]),
>>>>>>>>>>>>>     UnorderedList: UNION(NULL, STRING),
>>>>>>>>>>>>>     OrderedList: UNION(NULL, STRING)
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> ),
>>>>>>>>>>>>> Timestamp: STRING
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And here are couple of fields names,
generated for it :)
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ildar
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Ildar
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Amoudi, Abdullah.
>>> 
> 

Best regards,
Ildar


Mime
View raw message