asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Metadata names generation
Date Mon, 13 Jul 2015 04:06:24 GMT
Well, right now it will return the following error message: “A datatype with name 'Field_c_in_Field_b_in_FooType'
already exists”.
Is that OK, or it exposes unnecessary details of the name generation to the user?

> On Jul 10, 2015, at 16:47, Mike Carey <dtabass@gmail.com> wrote:
> 
> Ps - meaningful as in saying the issue is a duplicate field name and
> echoing the offending name?
> On Jul 10, 2015 4:46 PM, "Mike Carey" <dtabass@gmail.com> wrote:
> 
>> Maybe it'd be okay to get a meaningful error message in that unlikely but
>> indeed possible case?
>> On Jul 9, 2015 11:59 PM, "Ildar Absalyamov" <ildar.absalyamov@gmail.com>
>> wrote:
>> 
>>> Sorry, I dropped the ball regarding this thread due to the trip to
>>> Seattle and first weeks at MSR.
>>> 
>>> Now when I am mostly done with type system changes, required for this
>>> issue, I finally gave a look whether the ambiguity is resolved in current
>>> master, and the answer it is not :)
>>> The following AQL will fail due to generated type names collision:
>>> 
>>> use dataverse test;
>>> create type FooType as open {
>>>  "b": { "c" : { "d": string }},
>>>  "c_in_Field_b": { "d": int }
>>> }
>>> 
>>> If I got Till’s comments correctly nothing prevents JSON identifiers to
>>> have double quote characters in them if they are escaped, i.e. field with
>>> name “foo\”bar” is absolutely legal, but by the time it will get though
>>> parser it will become “foo”bar”, right?
>>> Can we carry the escaped field name as is and use it for typename
>>> generation?
>>> 
>>>> On Jun 25, 2015, at 23:42, Mike Carey <dtabass@gmail.com> wrote:
>>>> 
>>>> I don't see any technical reason to disallow characters in the escaped
>>> case.
>>>> That being said, we don't have to pick (for our internal names) things
>>> that we'd
>>>> prefer not to see being done.  :-)  I have mixed feelings on the .'s
>>> for generated
>>>> type names - as it's not like users will need to use those names for
>>> anything (as
>>>> they are internal)....
>>>> 
>>>> On 6/24/15 6:27 PM, Till Westmann wrote:
>>>>>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu>
wrote:
>>>>>> 
>>>>>> a clear case is where there is a data type with a field named "a.b"
>>> and
>>>>>> another field named "a" which has a nested field named "b".
>>>>>> 
>>>>>> This is allowed right now. You would have to access the first as
>>> "a.b" and
>>>>>> the second as a.b. The quotes basically tell the parser "this is
a
>>> single
>>>>>> name with whatever characters I want in it.”
>>>>> a.b is mainly a convenient shortcut for “a”.”b"
>>>>> 
>>>>>> To me it seems fine to
>>>>>> disallow some characters, but back when I had discussions about this
>>> with
>>>>>> Vinayak, Mike, and Till, Till was arguing against disallowing
>>> characters. I
>>>>>> can't really remember his reasons now though.
>>>>>> 
>>>>>> @Till, what are your thoughts on this?
>>>>> All characters are allowed for field names in JSON (http://json.org <
>>> http://json.org/>).
>>>>> So if disallow some characters, we will need to map names that contain
>>> them so something else (or not allow such JSON documents).
>>>>> It seems that that will get messy and/or painful pretty quickly.
>>>>> 
>>>>> Cheers,
>>>>> Till
>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <
>>> bamousaa@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> If that's the case, then I think we need to disallow using the
"."
>>> since it
>>>>>>> is used to access nested fields and can definitely cause ambiguity.
>>>>>>> 
>>>>>>> a clear case is where there is a data type with a field named
"a.b"
>>> and
>>>>>>> another field named "a" which has a nested field named "b".
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu>
>>> wrote:
>>>>>>> 
>>>>>>>> I think there is no completely user-friendly way around this.
>>> Basically
>>>>>>> our
>>>>>>>> names allow ALL characters if they are incapsulated in quotes,
so
>>> there
>>>>>>>> isn't a character we can use that doesn't have the potential
for
>>>>>>> ambiguity
>>>>>>>> from the user's perspective. This is why I had to change
the nested
>>> stuff
>>>>>>>> in indexing to be a list of strings rather than a single
string.
>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>>> In this case, there could be ambiguity in the names.
 Does it
>>> matter?
>>>>>>>>> 
>>>>>>>>> Chen
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>>>> wrote:
>>>>>>>>>> Fieldnames do allow these characters (both of them).
>>>>>>>>>> Steven
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I also prefer "." than "_".  Also want to confirm
that field
>>> names
>>>>>>>>> don't
>>>>>>>>>>> allow these two characters.
>>>>>>>>>>> 
>>>>>>>>>>> Chen
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs
<
>>> sjaco002@ucr.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>>> I second Young-Seek (especially since this
is the syntax that
>>>>>>> users
>>>>>>>>>> will
>>>>>>>>>>>> use themselves for nested information in
queries).
>>>>>>>>>>>> 
>>>>>>>>>>>> Steven
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok
Kim <
>>>>>>>> kisskys@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> It seems better to use "." instead of
"_" since "." is more
>>>>>>>>> intuitive
>>>>>>>>>>> (at
>>>>>>>>>>>>> least to me) than "_".
>>>>>>>>>>>>> For example, the FacebookUserType_address
will be
>>>>>>>>>>>> FacebookUserType.address.
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Young-Seok
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike
Carey <dtabass@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Much cleaner!  Others should weigh
in here to help finalize
>>>>>>> the
>>>>>>>>>>>>>> conventions....  Thoughts?
>>>>>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov"
<
>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> So the general solution is that
the generated names should
>>>>>>>>> become
>>>>>>>>>>>> less
>>>>>>>>>>>>>>> verbose (consider previous examples):
>>>>>>>>>>>>>>> 1) Anonymous fields naming scheme
will change to
>>>>>>>> outerTypeName
>>>>>>>>> +
>>>>>>>>>>> “_”
>>>>>>>>>>>> +
>>>>>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>>>>>>>> changed
>>>>>>>>> to
>>>>>>>>>>>>>>> “FacebookUserType_address”
>>>>>>>>>>>>>>> 2) Anonymous collection item
naming scheme stays the same,
>>>>>>>> i.e.
>>>>>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>>>>>>> to
>>>>>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
>>>>>>>> because
>>>>>>>>>> the
>>>>>>>>>>>>>>> anonymous field employment naming
was changed as described
>>>>>>>>>> earlier)
>>>>>>>>>>>>>>> 3) Union type completely seizes
to exist in metadata (it
>>>>>>>> stays
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>>> object model though), i.e.
>>>>>>>>>>>>>>> 
>>>>>>> 
>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>>>>>>>>>>>>>>> is changed to
>>>>>>>> “FacebookUserType_employment_ItemType_end-date”,
>>>>>>>>>>> where
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> type metadata will have an additional
field “Optional” with
>>>>>>>>> value
>>>>>>>>>>>>> “true”.
>>>>>>>>>>>>>>>> On Jun 19, 2015, at 18:11,
Ildar Absalyamov <
>>>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> So I have done half of the
fix, which is moved name
>>>>>>>>> generation
>>>>>>>>>>>> logic
>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>> of the Metadata node to the client.
>>>>>>>>>>>>>>>> Up to that point nothing
in Metadata format was changed,
>>>>>>>>> which
>>>>>>>>>>>> makes
>>>>>>>>>>>>> me
>>>>>>>>>>>>>>> wonder whether I should proceed
with the following changes.
>>>>>>>>>>>>>>>> As it could be seen from
the previous email getting rid
>>>>>>> of
>>>>>>>>>>>>>>> union-inferred name generation
would make auto generated
>>>>>>> type
>>>>>>>>>> names
>>>>>>>>>>>>> less
>>>>>>>>>>>>>>> scary, but not entirely.
>>>>>>>>>>>>>>>> Having in mind what Mike
mentioned earlier today, should
>>>>>>> we
>>>>>>>>> do
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> about other auto generated type
name cases?
>>>>>>>>>>>>>>>>> On Jun 19, 2015, at 13:01,
Ildar Absalyamov <
>>>>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>>
wrote:
>>>>>>>>>>>>>>>>> Currently we are generating
the names for
>>>>>>> inner\anonymous
>>>>>>>>>> types
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> following cases:
>>>>>>>>>>>>>>>>> 1) Anonymous field in
the record.
>>>>>>>>>>>>>>>>> AQL Example:
>>>>>>>>>>>>>>>>> create type FacebookUserType
as closed {
>>>>>>>>>>>>>>>>>       id: int32,
>>>>>>>>>>>>>>>>>       name: string,
>>>>>>>>>>>>>>>>>       address: {
>>>>>>>>>>>>>>>>>            address_line:
string,
>>>>>>>>>>>>>>>>>            city: string
>>>>>>>>>>>>>>>>>            state: string
>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> The pattern for generating
an anonymous field name is
>>>>>>>>>> "Field_" +
>>>>>>>>>>>>>>> fieldName + "_in_" + outerTypeName,
which translates to
>>>>>>>>>>>>>>> "Field_address_in_FacebookUserType"
in the given example
>>>>>>>>>>>>>>>>> 2) Anonymous collection
(ordered\unordered list) item
>>>>>>>>>>>>>>>>> create type FacebookUserType
as closed {
>>>>>>>>>>>>>>>>>       id: int32,
>>>>>>>>>>>>>>>>>       name: string,
>>>>>>>>>>>>>>>>>       employment: [{
>>>>>>>>>>>>>>>>>            organization-name:
string,
>>>>>>>>>>>>>>>>>            start-date:
date
>>>>>>>>>>>>>>>>>            end-date:
date?
>>>>>>>>>>>>>>>>>    }]
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> The pattern for generating
an anonymous collection item
>>>>>>>> name
>>>>>>>>>> is
>>>>>>>>>>>>>>> collectionFieldName+_ItemType",
which translates to
>>>>>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>>>>>>> given
>>>>>>>>>>> example
>>>>>>>>>>>>>>>>> 3) Nullable fields
>>>>>>>>>>>>>>>>> Same example as above
could be used (end-date field):
>>>>>>> the
>>>>>>>>>>> pattern
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> generating a nullable field name
is "Type_#" +
>>>>>>>>>>>> fieldsNumberInUnoinList
>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> "_UnionType_" + outerTypeName,
which translates to
>>>>>>>>>>>>>>> 
>>>>>>> 
>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>>>>>>>>>>>>>>> in the given example.
>>>>>>>>>>>>>>>>> So you can see these
auto generated names could stack up
>>>>>>>>>> pretty
>>>>>>>>>>>> fast
>>>>>>>>>>>>>>> and be completely incomprehensible.
Just to give you a
>>>>>>> small
>>>>>>>>>> flavor
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> that, here is one of the metadata
datasets type
>>>>>>> definitions:
>>>>>>>>>>>>>>>>> open {
>>>>>>>>>>>>>>>>> DataverseName: STRING,
>>>>>>>>>>>>>>>>> DatatypeName: STRING,
>>>>>>>>>>>>>>>>> Derived: UNION(NULL,
open {
>>>>>>>>>>>>>>>>>     Tag: STRING,
>>>>>>>>>>>>>>>>>     IsAnonymous: BOOLEAN,
>>>>>>>>>>>>>>>>>     EnumValues: UNION(NULL,
[ STRING ]),
>>>>>>>>>>>>>>>>>     Record: UNION(NULL,
open {
>>>>>>>>>>>>>>>>>         IsOpen: BOOLEAN,
>>>>>>>>>>>>>>>>>         Fields: [ open
{
>>>>>>>>>>>>>>>>>             FieldName:
STRING,
>>>>>>>>>>>>>>>>>             FieldType:
STRING
>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>     ),
>>>>>>>>>>>>>>>>>     Union: UNION(NULL,
[ STRING ]),
>>>>>>>>>>>>>>>>>     UnorderedList: UNION(NULL,
STRING),
>>>>>>>>>>>>>>>>>     OrderedList: UNION(NULL,
STRING)
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>> ),
>>>>>>>>>>>>>>>>> Timestamp: STRING
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> And here are couple of
fields names, generated for it :)
>>>>>>>>>>>>>>>>> 
>>>>>>> 
>>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>> 
>>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>> 
>>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Amoudi, Abdullah.
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> Best regards,
>>> Ildar
>>> 
>>> 

Best regards,
Ildar


Mime
View raw message