asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Metadata names generation
Date Fri, 10 Jul 2015 23:47:58 GMT
Ps - meaningful as in saying the issue is a duplicate field name and
echoing the offending name?
On Jul 10, 2015 4:46 PM, "Mike Carey" <dtabass@gmail.com> wrote:

> Maybe it'd be okay to get a meaningful error message in that unlikely but
> indeed possible case?
> On Jul 9, 2015 11:59 PM, "Ildar Absalyamov" <ildar.absalyamov@gmail.com>
> wrote:
>
>> Sorry, I dropped the ball regarding this thread due to the trip to
>> Seattle and first weeks at MSR.
>>
>> Now when I am mostly done with type system changes, required for this
>> issue, I finally gave a look whether the ambiguity is resolved in current
>> master, and the answer it is not :)
>> The following AQL will fail due to generated type names collision:
>>
>> use dataverse test;
>> create type FooType as open {
>>   "b": { "c" : { "d": string }},
>>   "c_in_Field_b": { "d": int }
>> }
>>
>> If I got Till’s comments correctly nothing prevents JSON identifiers to
>> have double quote characters in them if they are escaped, i.e. field with
>> name “foo\”bar” is absolutely legal, but by the time it will get though
>> parser it will become “foo”bar”, right?
>> Can we carry the escaped field name as is and use it for typename
>> generation?
>>
>> > On Jun 25, 2015, at 23:42, Mike Carey <dtabass@gmail.com> wrote:
>> >
>> > I don't see any technical reason to disallow characters in the escaped
>> case.
>> > That being said, we don't have to pick (for our internal names) things
>> that we'd
>> > prefer not to see being done.  :-)  I have mixed feelings on the .'s
>> for generated
>> > type names - as it's not like users will need to use those names for
>> anything (as
>> > they are internal)....
>> >
>> > On 6/24/15 6:27 PM, Till Westmann wrote:
>> >>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu>
wrote:
>> >>>
>> >>> a clear case is where there is a data type with a field named "a.b"
>> and
>> >>> another field named "a" which has a nested field named "b".
>> >>>
>> >>> This is allowed right now. You would have to access the first as
>> "a.b" and
>> >>> the second as a.b. The quotes basically tell the parser "this is a
>> single
>> >>> name with whatever characters I want in it.”
>> >> a.b is mainly a convenient shortcut for “a”.”b"
>> >>
>> >>> To me it seems fine to
>> >>> disallow some characters, but back when I had discussions about this
>> with
>> >>> Vinayak, Mike, and Till, Till was arguing against disallowing
>> characters. I
>> >>> can't really remember his reasons now though.
>> >>>
>> >>> @Till, what are your thoughts on this?
>> >> All characters are allowed for field names in JSON (http://json.org <
>> http://json.org/>).
>> >> So if disallow some characters, we will need to map names that contain
>> them so something else (or not allow such JSON documents).
>> >> It seems that that will get messy and/or painful pretty quickly.
>> >>
>> >> Cheers,
>> >> Till
>> >>
>> >>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <
>> bamousaa@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> If that's the case, then I think we need to disallow using the "."
>> since it
>> >>>> is used to access nested fields and can definitely cause ambiguity.
>> >>>>
>> >>>> a clear case is where there is a data type with a field named "a.b"
>> and
>> >>>> another field named "a" which has a nested field named "b".
>> >>>>
>> >>>> Thoughts?
>> >>>>
>> >>>>
>> >>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu>
>> wrote:
>> >>>>
>> >>>>> I think there is no completely user-friendly way around this.
>> Basically
>> >>>> our
>> >>>>> names allow ALL characters if they are incapsulated in quotes,
so
>> there
>> >>>>> isn't a character we can use that doesn't have the potential
for
>> >>>> ambiguity
>> >>>>> from the user's perspective. This is why I had to change the
nested
>> stuff
>> >>>>> in indexing to be a list of strings rather than a single string.
>> >>>>> Steven
>> >>>>>
>> >>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com>
wrote:
>> >>>>>
>> >>>>>> In this case, there could be ambiguity in the names.  Does
it
>> matter?
>> >>>>>>
>> >>>>>> Chen
>> >>>>>>
>> >>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>> >>>>> wrote:
>> >>>>>>> Fieldnames do allow these characters (both of them).
>> >>>>>>> Steven
>> >>>>>>>
>> >>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
>> wrote:
>> >>>>>>>
>> >>>>>>>> I also prefer "." than "_".  Also want to confirm
that field
>> names
>> >>>>>> don't
>> >>>>>>>> allow these two characters.
>> >>>>>>>>
>> >>>>>>>> Chen
>> >>>>>>>>
>> >>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs
<
>> sjaco002@ucr.edu>
>> >>>>>>> wrote:
>> >>>>>>>>> I second Young-Seek (especially since this is
the syntax that
>> >>>> users
>> >>>>>>> will
>> >>>>>>>>> use themselves for nested information in queries).
>> >>>>>>>>>
>> >>>>>>>>> Steven
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok
Kim <
>> >>>>> kisskys@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> It seems better to use "." instead of "_"
since "." is more
>> >>>>>> intuitive
>> >>>>>>>> (at
>> >>>>>>>>>> least to me) than "_".
>> >>>>>>>>>> For example, the FacebookUserType_address
will be
>> >>>>>>>>> FacebookUserType.address.
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Young-Seok
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey
<dtabass@gmail.com
>> >>>>>>> wrote:
>> >>>>>>>>>>> Much cleaner!  Others should weigh in
here to help finalize
>> >>>> the
>> >>>>>>>>>>> conventions....  Thoughts?
>> >>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov"
<
>> >>>>> iabsa001@cs.ucr.edu
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>> So the general solution is that
the generated names should
>> >>>>>> become
>> >>>>>>>>> less
>> >>>>>>>>>>>> verbose (consider previous examples):
>> >>>>>>>>>>>> 1) Anonymous fields naming scheme
will change to
>> >>>>> outerTypeName
>> >>>>>> +
>> >>>>>>>> “_”
>> >>>>>>>>> +
>> >>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>> >>>>> changed
>> >>>>>> to
>> >>>>>>>>>>>> “FacebookUserType_address”
>> >>>>>>>>>>>> 2) Anonymous collection item naming
scheme stays the same,
>> >>>>> i.e.
>> >>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>> >>>> to
>> >>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
>> >>>>> because
>> >>>>>>> the
>> >>>>>>>>>>>> anonymous field employment naming
was changed as described
>> >>>>>>> earlier)
>> >>>>>>>>>>>> 3) Union type completely seizes
to exist in metadata (it
>> >>>>> stays
>> >>>>>> in
>> >>>>>>>> the
>> >>>>>>>>>>>> object model though), i.e.
>> >>>>>>>>>>>>
>> >>>>
>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>> >>>>>>>>>>>> is changed to
>> >>>>> “FacebookUserType_employment_ItemType_end-date”,
>> >>>>>>>> where
>> >>>>>>>>>> the
>> >>>>>>>>>>>> type metadata will have an additional
field “Optional” with
>> >>>>>> value
>> >>>>>>>>>> “true”.
>> >>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar
Absalyamov <
>> >>>>>>> iabsa001@cs.ucr.edu
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>> So I have done half of the fix,
which is moved name
>> >>>>>> generation
>> >>>>>>>>> logic
>> >>>>>>>>>>> out
>> >>>>>>>>>>>> of the Metadata node to the client.
>> >>>>>>>>>>>>> Up to that point nothing in
Metadata format was changed,
>> >>>>>> which
>> >>>>>>>>> makes
>> >>>>>>>>>> me
>> >>>>>>>>>>>> wonder whether I should proceed
with the following changes.
>> >>>>>>>>>>>>> As it could be seen from the
previous email getting rid
>> >>>> of
>> >>>>>>>>>>>> union-inferred name generation would
make auto generated
>> >>>> type
>> >>>>>>> names
>> >>>>>>>>>> less
>> >>>>>>>>>>>> scary, but not entirely.
>> >>>>>>>>>>>>> Having in mind what Mike mentioned
earlier today, should
>> >>>> we
>> >>>>>> do
>> >>>>>>>>>>> something
>> >>>>>>>>>>>> about other auto generated type
name cases?
>> >>>>>>>>>>>>>> On Jun 19, 2015, at 13:01,
Ildar Absalyamov <
>> >>>>>>>> iabsa001@cs.ucr.edu
>> >>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>>
wrote:
>> >>>>>>>>>>>>>> Currently we are generating
the names for
>> >>>> inner\anonymous
>> >>>>>>> types
>> >>>>>>>> in
>> >>>>>>>>>> the
>> >>>>>>>>>>>> following cases:
>> >>>>>>>>>>>>>> 1) Anonymous field in the
record.
>> >>>>>>>>>>>>>> AQL Example:
>> >>>>>>>>>>>>>> create type FacebookUserType
as closed {
>> >>>>>>>>>>>>>>        id: int32,
>> >>>>>>>>>>>>>>        name: string,
>> >>>>>>>>>>>>>>        address: {
>> >>>>>>>>>>>>>>             address_line:
string,
>> >>>>>>>>>>>>>>             city: string
>> >>>>>>>>>>>>>>             state: string
>> >>>>>>>>>>>>>>     }
>> >>>>>>>>>>>>>>    }
>> >>>>>>>>>>>>>> The pattern for generating
an anonymous field name is
>> >>>>>>> "Field_" +
>> >>>>>>>>>>>> fieldName + "_in_" + outerTypeName,
which translates to
>> >>>>>>>>>>>> "Field_address_in_FacebookUserType"
in the given example
>> >>>>>>>>>>>>>> 2) Anonymous collection
(ordered\unordered list) item
>> >>>>>>>>>>>>>> create type FacebookUserType
as closed {
>> >>>>>>>>>>>>>>        id: int32,
>> >>>>>>>>>>>>>>        name: string,
>> >>>>>>>>>>>>>>        employment: [{
>> >>>>>>>>>>>>>>             organization-name:
string,
>> >>>>>>>>>>>>>>             start-date:
date
>> >>>>>>>>>>>>>>             end-date: date?
>> >>>>>>>>>>>>>>     }]
>> >>>>>>>>>>>>>>    }
>> >>>>>>>>>>>>>> The pattern for generating
an anonymous collection item
>> >>>>> name
>> >>>>>>> is
>> >>>>>>>>>>>> collectionFieldName+_ItemType",
which translates to
>> >>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>> >>>> given
>> >>>>>>>> example
>> >>>>>>>>>>>>>> 3) Nullable fields
>> >>>>>>>>>>>>>> Same example as above could
be used (end-date field):
>> >>>> the
>> >>>>>>>> pattern
>> >>>>>>>>>> for
>> >>>>>>>>>>>> generating a nullable field name
is "Type_#" +
>> >>>>>>>>> fieldsNumberInUnoinList
>> >>>>>>>>>> +
>> >>>>>>>>>>>> "_UnionType_" + outerTypeName, which
translates to
>> >>>>>>>>>>>>
>> >>>>
>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>> >>>>>>>>>>>> in the given example.
>> >>>>>>>>>>>>>> So you can see these auto
generated names could stack up
>> >>>>>>> pretty
>> >>>>>>>>> fast
>> >>>>>>>>>>>> and be completely incomprehensible.
Just to give you a
>> >>>> small
>> >>>>>>> flavor
>> >>>>>>>>> of
>> >>>>>>>>>>>> that, here is one of the metadata
datasets type
>> >>>> definitions:
>> >>>>>>>>>>>>>> open {
>> >>>>>>>>>>>>>>  DataverseName: STRING,
>> >>>>>>>>>>>>>>  DatatypeName: STRING,
>> >>>>>>>>>>>>>>  Derived: UNION(NULL, open
{
>> >>>>>>>>>>>>>>      Tag: STRING,
>> >>>>>>>>>>>>>>      IsAnonymous: BOOLEAN,
>> >>>>>>>>>>>>>>      EnumValues: UNION(NULL,
[ STRING ]),
>> >>>>>>>>>>>>>>      Record: UNION(NULL,
open {
>> >>>>>>>>>>>>>>          IsOpen: BOOLEAN,
>> >>>>>>>>>>>>>>          Fields: [ open
{
>> >>>>>>>>>>>>>>              FieldName:
STRING,
>> >>>>>>>>>>>>>>              FieldType:
STRING
>> >>>>>>>>>>>>>>            }
>> >>>>>>>>>>>>>>          ]
>> >>>>>>>>>>>>>>        }
>> >>>>>>>>>>>>>>      ),
>> >>>>>>>>>>>>>>      Union: UNION(NULL,
[ STRING ]),
>> >>>>>>>>>>>>>>      UnorderedList: UNION(NULL,
STRING),
>> >>>>>>>>>>>>>>      OrderedList: UNION(NULL,
STRING)
>> >>>>>>>>>>>>>>    }
>> >>>>>>>>>>>>>>  ),
>> >>>>>>>>>>>>>>  Timestamp: STRING
>> >>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> And here are couple of fields
names, generated for it :)
>> >>>>>>>>>>>>>>
>> >>>>
>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>> >>>>
>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>> >>>>
>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>> >>>>>>>>>>>>>> Best regards,
>> >>>>>>>>>>>>>> Ildar
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>> Best regards,
>> >>>>>>>>>>>>> Ildar
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>> Best regards,
>> >>>>>>>>>>>> Ildar
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Amoudi, Abdullah.
>> >>>>
>> >>
>> >
>>
>> Best regards,
>> Ildar
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message