asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Metadata names generation
Date Fri, 10 Jul 2015 06:58:12 GMT
Sorry, I dropped the ball regarding this thread due to the trip to Seattle and first weeks
at MSR.

Now when I am mostly done with type system changes, required for this issue, I finally gave
a look whether the ambiguity is resolved in current master, and the answer it is not :)
The following AQL will fail due to generated type names collision:

use dataverse test;
create type FooType as open {
  "b": { "c" : { "d": string }},
  "c_in_Field_b": { "d": int }
}

If I got Till’s comments correctly nothing prevents JSON identifiers to have double quote
characters in them if they are escaped, i.e. field with name “foo\”bar” is absolutely
legal, but by the time it will get though parser it will become “foo”bar”, right?
Can we carry the escaped field name as is and use it for typename generation?

> On Jun 25, 2015, at 23:42, Mike Carey <dtabass@gmail.com> wrote:
> 
> I don't see any technical reason to disallow characters in the escaped case.
> That being said, we don't have to pick (for our internal names) things that we'd
> prefer not to see being done.  :-)  I have mixed feelings on the .'s for generated
> type names - as it's not like users will need to use those names for anything (as
> they are internal)....
> 
> On 6/24/15 6:27 PM, Till Westmann wrote:
>>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>> 
>>> a clear case is where there is a data type with a field named "a.b" and
>>> another field named "a" which has a nested field named "b".
>>> 
>>> This is allowed right now. You would have to access the first as "a.b" and
>>> the second as a.b. The quotes basically tell the parser "this is a single
>>> name with whatever characters I want in it.”
>> a.b is mainly a convenient shortcut for “a”.”b"
>> 
>>> To me it seems fine to
>>> disallow some characters, but back when I had discussions about this with
>>> Vinayak, Mike, and Till, Till was arguing against disallowing characters. I
>>> can't really remember his reasons now though.
>>> 
>>> @Till, what are your thoughts on this?
>> All characters are allowed for field names in JSON (http://json.org <http://json.org/>).
>> So if disallow some characters, we will need to map names that contain them so something
else (or not allow such JSON documents).
>> It seems that that will get messy and/or painful pretty quickly.
>> 
>> Cheers,
>> Till
>> 
>>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <bamousaa@gmail.com>
>>> wrote:
>>> 
>>>> If that's the case, then I think we need to disallow using the "." since
it
>>>> is used to access nested fields and can definitely cause ambiguity.
>>>> 
>>>> a clear case is where there is a data type with a field named "a.b" and
>>>> another field named "a" which has a nested field named "b".
>>>> 
>>>> Thoughts?
>>>> 
>>>> 
>>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>>> 
>>>>> I think there is no completely user-friendly way around this. Basically
>>>> our
>>>>> names allow ALL characters if they are incapsulated in quotes, so there
>>>>> isn't a character we can use that doesn't have the potential for
>>>> ambiguity
>>>>> from the user's perspective. This is why I had to change the nested stuff
>>>>> in indexing to be a list of strings rather than a single string.
>>>>> Steven
>>>>> 
>>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com> wrote:
>>>>> 
>>>>>> In this case, there could be ambiguity in the names.  Does it matter?
>>>>>> 
>>>>>> Chen
>>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>> wrote:
>>>>>>> Fieldnames do allow these characters (both of them).
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>>> 
>>>>>>>> I also prefer "." than "_".  Also want to confirm that field
names
>>>>>> don't
>>>>>>>> allow these two characters.
>>>>>>>> 
>>>>>>>> Chen
>>>>>>>> 
>>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>>> wrote:
>>>>>>>>> I second Young-Seek (especially since this is the syntax
that
>>>> users
>>>>>>> will
>>>>>>>>> use themselves for nested information in queries).
>>>>>>>>> 
>>>>>>>>> Steven
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok Kim <
>>>>> kisskys@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> It seems better to use "." instead of "_" since "."
is more
>>>>>> intuitive
>>>>>>>> (at
>>>>>>>>>> least to me) than "_".
>>>>>>>>>> For example, the FacebookUserType_address will be
>>>>>>>>> FacebookUserType.address.
>>>>>>>>>> Best,
>>>>>>>>>> Young-Seok
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey <dtabass@gmail.com
>>>>>>> wrote:
>>>>>>>>>>> Much cleaner!  Others should weigh in here to
help finalize
>>>> the
>>>>>>>>>>> conventions....  Thoughts?
>>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov" <
>>>>> iabsa001@cs.ucr.edu
>>>>>>>>> wrote:
>>>>>>>>>>>> So the general solution is that the generated
names should
>>>>>> become
>>>>>>>>> less
>>>>>>>>>>>> verbose (consider previous examples):
>>>>>>>>>>>> 1) Anonymous fields naming scheme will change
to
>>>>> outerTypeName
>>>>>> +
>>>>>>>> “_”
>>>>>>>>> +
>>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>>>>> changed
>>>>>> to
>>>>>>>>>>>> “FacebookUserType_address”
>>>>>>>>>>>> 2) Anonymous collection item naming scheme
stays the same,
>>>>> i.e.
>>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>>>> to
>>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
>>>>> because
>>>>>>> the
>>>>>>>>>>>> anonymous field employment naming was changed
as described
>>>>>>> earlier)
>>>>>>>>>>>> 3) Union type completely seizes to exist
in metadata (it
>>>>> stays
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>> object model though), i.e.
>>>>>>>>>>>> 
>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>>>>>>>>>>>> is changed to
>>>>> “FacebookUserType_employment_ItemType_end-date”,
>>>>>>>> where
>>>>>>>>>> the
>>>>>>>>>>>> type metadata will have an additional field
“Optional” with
>>>>>> value
>>>>>>>>>> “true”.
>>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar Absalyamov
<
>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> So I have done half of the fix, which
is moved name
>>>>>> generation
>>>>>>>>> logic
>>>>>>>>>>> out
>>>>>>>>>>>> of the Metadata node to the client.
>>>>>>>>>>>>> Up to that point nothing in Metadata
format was changed,
>>>>>> which
>>>>>>>>> makes
>>>>>>>>>> me
>>>>>>>>>>>> wonder whether I should proceed with the
following changes.
>>>>>>>>>>>>> As it could be seen from the previous
email getting rid
>>>> of
>>>>>>>>>>>> union-inferred name generation would make
auto generated
>>>> type
>>>>>>> names
>>>>>>>>>> less
>>>>>>>>>>>> scary, but not entirely.
>>>>>>>>>>>>> Having in mind what Mike mentioned earlier
today, should
>>>> we
>>>>>> do
>>>>>>>>>>> something
>>>>>>>>>>>> about other auto generated type name cases?
>>>>>>>>>>>>>> On Jun 19, 2015, at 13:01, Ildar
Absalyamov <
>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>> wrote:
>>>>>>>>>>>>>> Currently we are generating the names
for
>>>> inner\anonymous
>>>>>>> types
>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>>>> following cases:
>>>>>>>>>>>>>> 1) Anonymous field in the record.
>>>>>>>>>>>>>> AQL Example:
>>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>>        id: int32,
>>>>>>>>>>>>>>        name: string,
>>>>>>>>>>>>>>        address: {
>>>>>>>>>>>>>>             address_line: string,
>>>>>>>>>>>>>>             city: string
>>>>>>>>>>>>>>             state: string
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>> The pattern for generating an anonymous
field name is
>>>>>>> "Field_" +
>>>>>>>>>>>> fieldName + "_in_" + outerTypeName, which
translates to
>>>>>>>>>>>> "Field_address_in_FacebookUserType" in the
given example
>>>>>>>>>>>>>> 2) Anonymous collection (ordered\unordered
list) item
>>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>>        id: int32,
>>>>>>>>>>>>>>        name: string,
>>>>>>>>>>>>>>        employment: [{
>>>>>>>>>>>>>>             organization-name: string,
>>>>>>>>>>>>>>             start-date: date
>>>>>>>>>>>>>>             end-date: date?
>>>>>>>>>>>>>>     }]
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>> The pattern for generating an anonymous
collection item
>>>>> name
>>>>>>> is
>>>>>>>>>>>> collectionFieldName+_ItemType", which translates
to
>>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>>>> given
>>>>>>>> example
>>>>>>>>>>>>>> 3) Nullable fields
>>>>>>>>>>>>>> Same example as above could be used
(end-date field):
>>>> the
>>>>>>>> pattern
>>>>>>>>>> for
>>>>>>>>>>>> generating a nullable field name is "Type_#"
+
>>>>>>>>> fieldsNumberInUnoinList
>>>>>>>>>> +
>>>>>>>>>>>> "_UnionType_" + outerTypeName, which translates
to
>>>>>>>>>>>> 
>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>>>>>>>>>>>> in the given example.
>>>>>>>>>>>>>> So you can see these auto generated
names could stack up
>>>>>>> pretty
>>>>>>>>> fast
>>>>>>>>>>>> and be completely incomprehensible. Just
to give you a
>>>> small
>>>>>>> flavor
>>>>>>>>> of
>>>>>>>>>>>> that, here is one of the metadata datasets
type
>>>> definitions:
>>>>>>>>>>>>>> open {
>>>>>>>>>>>>>>  DataverseName: STRING,
>>>>>>>>>>>>>>  DatatypeName: STRING,
>>>>>>>>>>>>>>  Derived: UNION(NULL, open {
>>>>>>>>>>>>>>      Tag: STRING,
>>>>>>>>>>>>>>      IsAnonymous: BOOLEAN,
>>>>>>>>>>>>>>      EnumValues: UNION(NULL, [ STRING
]),
>>>>>>>>>>>>>>      Record: UNION(NULL, open {
>>>>>>>>>>>>>>          IsOpen: BOOLEAN,
>>>>>>>>>>>>>>          Fields: [ open {
>>>>>>>>>>>>>>              FieldName: STRING,
>>>>>>>>>>>>>>              FieldType: STRING
>>>>>>>>>>>>>>            }
>>>>>>>>>>>>>>          ]
>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>      ),
>>>>>>>>>>>>>>      Union: UNION(NULL, [ STRING
]),
>>>>>>>>>>>>>>      UnorderedList: UNION(NULL, STRING),
>>>>>>>>>>>>>>      OrderedList: UNION(NULL, STRING)
>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>  ),
>>>>>>>>>>>>>>  Timestamp: STRING
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And here are couple of fields names,
generated for it :)
>>>>>>>>>>>>>> 
>>>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ildar
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Amoudi, Abdullah.
>>>> 
>> 
> 

Best regards,
Ildar


Mime
View raw message