asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Metadata names generation
Date Fri, 10 Jul 2015 23:46:30 GMT
Maybe it'd be okay to get a meaningful error message in that unlikely but
indeed possible case?
On Jul 9, 2015 11:59 PM, "Ildar Absalyamov" <ildar.absalyamov@gmail.com>
wrote:

> Sorry, I dropped the ball regarding this thread due to the trip to Seattle
> and first weeks at MSR.
>
> Now when I am mostly done with type system changes, required for this
> issue, I finally gave a look whether the ambiguity is resolved in current
> master, and the answer it is not :)
> The following AQL will fail due to generated type names collision:
>
> use dataverse test;
> create type FooType as open {
>   "b": { "c" : { "d": string }},
>   "c_in_Field_b": { "d": int }
> }
>
> If I got Till’s comments correctly nothing prevents JSON identifiers to
> have double quote characters in them if they are escaped, i.e. field with
> name “foo\”bar” is absolutely legal, but by the time it will get though
> parser it will become “foo”bar”, right?
> Can we carry the escaped field name as is and use it for typename
> generation?
>
> > On Jun 25, 2015, at 23:42, Mike Carey <dtabass@gmail.com> wrote:
> >
> > I don't see any technical reason to disallow characters in the escaped
> case.
> > That being said, we don't have to pick (for our internal names) things
> that we'd
> > prefer not to see being done.  :-)  I have mixed feelings on the .'s for
> generated
> > type names - as it's not like users will need to use those names for
> anything (as
> > they are internal)....
> >
> > On 6/24/15 6:27 PM, Till Westmann wrote:
> >>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >>>
> >>> a clear case is where there is a data type with a field named "a.b" and
> >>> another field named "a" which has a nested field named "b".
> >>>
> >>> This is allowed right now. You would have to access the first as "a.b"
> and
> >>> the second as a.b. The quotes basically tell the parser "this is a
> single
> >>> name with whatever characters I want in it.”
> >> a.b is mainly a convenient shortcut for “a”.”b"
> >>
> >>> To me it seems fine to
> >>> disallow some characters, but back when I had discussions about this
> with
> >>> Vinayak, Mike, and Till, Till was arguing against disallowing
> characters. I
> >>> can't really remember his reasons now though.
> >>>
> >>> @Till, what are your thoughts on this?
> >> All characters are allowed for field names in JSON (http://json.org <
> http://json.org/>).
> >> So if disallow some characters, we will need to map names that contain
> them so something else (or not allow such JSON documents).
> >> It seems that that will get messy and/or painful pretty quickly.
> >>
> >> Cheers,
> >> Till
> >>
> >>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>> wrote:
> >>>
> >>>> If that's the case, then I think we need to disallow using the "."
> since it
> >>>> is used to access nested fields and can definitely cause ambiguity.
> >>>>
> >>>> a clear case is where there is a data type with a field named "a.b"
> and
> >>>> another field named "a" which has a nested field named "b".
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu>
> wrote:
> >>>>
> >>>>> I think there is no completely user-friendly way around this.
> Basically
> >>>> our
> >>>>> names allow ALL characters if they are incapsulated in quotes, so
> there
> >>>>> isn't a character we can use that doesn't have the potential for
> >>>> ambiguity
> >>>>> from the user's perspective. This is why I had to change the nested
> stuff
> >>>>> in indexing to be a list of strings rather than a single string.
> >>>>> Steven
> >>>>>
> >>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com>
wrote:
> >>>>>
> >>>>>> In this case, there could be ambiguity in the names.  Does it
> matter?
> >>>>>>
> >>>>>> Chen
> >>>>>>
> >>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
> >>>>> wrote:
> >>>>>>> Fieldnames do allow these characters (both of them).
> >>>>>>> Steven
> >>>>>>>
> >>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> I also prefer "." than "_".  Also want to confirm that
field names
> >>>>>> don't
> >>>>>>>> allow these two characters.
> >>>>>>>>
> >>>>>>>> Chen
> >>>>>>>>
> >>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs <sjaco002@ucr.edu
> >
> >>>>>>> wrote:
> >>>>>>>>> I second Young-Seek (especially since this is the
syntax that
> >>>> users
> >>>>>>> will
> >>>>>>>>> use themselves for nested information in queries).
> >>>>>>>>>
> >>>>>>>>> Steven
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok Kim
<
> >>>>> kisskys@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> It seems better to use "." instead of "_" since
"." is more
> >>>>>> intuitive
> >>>>>>>> (at
> >>>>>>>>>> least to me) than "_".
> >>>>>>>>>> For example, the FacebookUserType_address will
be
> >>>>>>>>> FacebookUserType.address.
> >>>>>>>>>> Best,
> >>>>>>>>>> Young-Seok
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey
<dtabass@gmail.com
> >>>>>>> wrote:
> >>>>>>>>>>> Much cleaner!  Others should weigh in here
to help finalize
> >>>> the
> >>>>>>>>>>> conventions....  Thoughts?
> >>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov"
<
> >>>>> iabsa001@cs.ucr.edu
> >>>>>>>>> wrote:
> >>>>>>>>>>>> So the general solution is that the
generated names should
> >>>>>> become
> >>>>>>>>> less
> >>>>>>>>>>>> verbose (consider previous examples):
> >>>>>>>>>>>> 1) Anonymous fields naming scheme will
change to
> >>>>> outerTypeName
> >>>>>> +
> >>>>>>>> “_”
> >>>>>>>>> +
> >>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
> >>>>> changed
> >>>>>> to
> >>>>>>>>>>>> “FacebookUserType_address”
> >>>>>>>>>>>> 2) Anonymous collection item naming
scheme stays the same,
> >>>>> i.e.
> >>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
> >>>> to
> >>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
> >>>>> because
> >>>>>>> the
> >>>>>>>>>>>> anonymous field employment naming was
changed as described
> >>>>>>> earlier)
> >>>>>>>>>>>> 3) Union type completely seizes to exist
in metadata (it
> >>>>> stays
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>> object model though), i.e.
> >>>>>>>>>>>>
> >>>>
> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
> >>>>>>>>>>>> is changed to
> >>>>> “FacebookUserType_employment_ItemType_end-date”,
> >>>>>>>> where
> >>>>>>>>>> the
> >>>>>>>>>>>> type metadata will have an additional
field “Optional” with
> >>>>>> value
> >>>>>>>>>> “true”.
> >>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar
Absalyamov <
> >>>>>>> iabsa001@cs.ucr.edu
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>> So I have done half of the fix,
which is moved name
> >>>>>> generation
> >>>>>>>>> logic
> >>>>>>>>>>> out
> >>>>>>>>>>>> of the Metadata node to the client.
> >>>>>>>>>>>>> Up to that point nothing in Metadata
format was changed,
> >>>>>> which
> >>>>>>>>> makes
> >>>>>>>>>> me
> >>>>>>>>>>>> wonder whether I should proceed with
the following changes.
> >>>>>>>>>>>>> As it could be seen from the previous
email getting rid
> >>>> of
> >>>>>>>>>>>> union-inferred name generation would
make auto generated
> >>>> type
> >>>>>>> names
> >>>>>>>>>> less
> >>>>>>>>>>>> scary, but not entirely.
> >>>>>>>>>>>>> Having in mind what Mike mentioned
earlier today, should
> >>>> we
> >>>>>> do
> >>>>>>>>>>> something
> >>>>>>>>>>>> about other auto generated type name
cases?
> >>>>>>>>>>>>>> On Jun 19, 2015, at 13:01, Ildar
Absalyamov <
> >>>>>>>> iabsa001@cs.ucr.edu
> >>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>>
wrote:
> >>>>>>>>>>>>>> Currently we are generating
the names for
> >>>> inner\anonymous
> >>>>>>> types
> >>>>>>>> in
> >>>>>>>>>> the
> >>>>>>>>>>>> following cases:
> >>>>>>>>>>>>>> 1) Anonymous field in the record.
> >>>>>>>>>>>>>> AQL Example:
> >>>>>>>>>>>>>> create type FacebookUserType
as closed {
> >>>>>>>>>>>>>>        id: int32,
> >>>>>>>>>>>>>>        name: string,
> >>>>>>>>>>>>>>        address: {
> >>>>>>>>>>>>>>             address_line: string,
> >>>>>>>>>>>>>>             city: string
> >>>>>>>>>>>>>>             state: string
> >>>>>>>>>>>>>>     }
> >>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>> The pattern for generating an
anonymous field name is
> >>>>>>> "Field_" +
> >>>>>>>>>>>> fieldName + "_in_" + outerTypeName,
which translates to
> >>>>>>>>>>>> "Field_address_in_FacebookUserType"
in the given example
> >>>>>>>>>>>>>> 2) Anonymous collection (ordered\unordered
list) item
> >>>>>>>>>>>>>> create type FacebookUserType
as closed {
> >>>>>>>>>>>>>>        id: int32,
> >>>>>>>>>>>>>>        name: string,
> >>>>>>>>>>>>>>        employment: [{
> >>>>>>>>>>>>>>             organization-name:
string,
> >>>>>>>>>>>>>>             start-date: date
> >>>>>>>>>>>>>>             end-date: date?
> >>>>>>>>>>>>>>     }]
> >>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>> The pattern for generating an
anonymous collection item
> >>>>> name
> >>>>>>> is
> >>>>>>>>>>>> collectionFieldName+_ItemType", which
translates to
> >>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
> >>>> given
> >>>>>>>> example
> >>>>>>>>>>>>>> 3) Nullable fields
> >>>>>>>>>>>>>> Same example as above could
be used (end-date field):
> >>>> the
> >>>>>>>> pattern
> >>>>>>>>>> for
> >>>>>>>>>>>> generating a nullable field name is
"Type_#" +
> >>>>>>>>> fieldsNumberInUnoinList
> >>>>>>>>>> +
> >>>>>>>>>>>> "_UnionType_" + outerTypeName, which
translates to
> >>>>>>>>>>>>
> >>>>
> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
> >>>>>>>>>>>> in the given example.
> >>>>>>>>>>>>>> So you can see these auto generated
names could stack up
> >>>>>>> pretty
> >>>>>>>>> fast
> >>>>>>>>>>>> and be completely incomprehensible.
Just to give you a
> >>>> small
> >>>>>>> flavor
> >>>>>>>>> of
> >>>>>>>>>>>> that, here is one of the metadata datasets
type
> >>>> definitions:
> >>>>>>>>>>>>>> open {
> >>>>>>>>>>>>>>  DataverseName: STRING,
> >>>>>>>>>>>>>>  DatatypeName: STRING,
> >>>>>>>>>>>>>>  Derived: UNION(NULL, open {
> >>>>>>>>>>>>>>      Tag: STRING,
> >>>>>>>>>>>>>>      IsAnonymous: BOOLEAN,
> >>>>>>>>>>>>>>      EnumValues: UNION(NULL,
[ STRING ]),
> >>>>>>>>>>>>>>      Record: UNION(NULL, open
{
> >>>>>>>>>>>>>>          IsOpen: BOOLEAN,
> >>>>>>>>>>>>>>          Fields: [ open {
> >>>>>>>>>>>>>>              FieldName: STRING,
> >>>>>>>>>>>>>>              FieldType: STRING
> >>>>>>>>>>>>>>            }
> >>>>>>>>>>>>>>          ]
> >>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>      ),
> >>>>>>>>>>>>>>      Union: UNION(NULL, [ STRING
]),
> >>>>>>>>>>>>>>      UnorderedList: UNION(NULL,
STRING),
> >>>>>>>>>>>>>>      OrderedList: UNION(NULL,
STRING)
> >>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>>  ),
> >>>>>>>>>>>>>>  Timestamp: STRING
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And here are couple of fields
names, generated for it :)
> >>>>>>>>>>>>>>
> >>>>
> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
> >>>>
> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
> >>>>
> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
> >>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>> Ildar
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>> Ildar
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Ildar
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Amoudi, Abdullah.
> >>>>
> >>
> >
>
> Best regards,
> Ildar
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message