asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Westmann <ti...@apache.org>
Subject Re: Metadata names generation
Date Thu, 25 Jun 2015 04:10:59 GMT
And what happens if I call my type “Field_b_in_FooType”?
It seems that I still can create conflicts (albeit not that easily) …

BTW, I wasn’t completely precise. There are characters that are not allowed in JSON strings:
a double quote, a single backslash and any unicode control character.
So we could use those, but we would have to find a way to serialize them (reversibly) when
writing those names.

Cheers,
Till

> On Jun 24, 2015, at 6:43 PM, Ildar Absalyamov <ildar.absalyamov@gmail.com> wrote:
> 
> OK, so when one is dealing with such names putting them in quotes will resolve ambiguity.
> However auto generated type names are just strings, and they should be unique.
> Consider the example:
> FooType as open {
>   “a”: string,
>   “b”: { “c” : { “d”: string }},
>   “b.c”: { “d”: string }
> }
> 
> The new naming scheme will generate these types: FooType.b, FooType.b.c and identical
FooType.b.c!
> Whereas the old naming will produce Field_b_in_FooType, Field_c_in_Field_b_in_FooType
and Field_b.c_in_FooType, thus resolving the name conflict.
> 
> So it seems type name verbosity was there for a reason?
> 
>> On Jun 24, 2015, at 18:27, Till Westmann <tillw@apache.org <mailto:tillw@apache.org>>
wrote:
>> 
>> 
>>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>> 
>>> a clear case is where there is a data type with a field named "a.b" and
>>> another field named "a" which has a nested field named "b".
>>> 
>>> This is allowed right now. You would have to access the first as "a.b" and
>>> the second as a.b. The quotes basically tell the parser "this is a single
>>> name with whatever characters I want in it.”
>> 
>> a.b is mainly a convenient shortcut for “a”.”b"
>> 
>>> To me it seems fine to
>>> disallow some characters, but back when I had discussions about this with
>>> Vinayak, Mike, and Till, Till was arguing against disallowing characters. I
>>> can't really remember his reasons now though.
>>> 
>>> @Till, what are your thoughts on this?
>> 
>> All characters are allowed for field names in JSON (http://json.org <http://json.org/>
<http://json.org/ <http://json.org/>>).
>> So if disallow some characters, we will need to map names that contain them so something
else (or not allow such JSON documents).
>> It seems that that will get messy and/or painful pretty quickly.
>> 
>> Cheers,
>> Till
>> 
>>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <bamousaa@gmail.com>
>>> wrote:
>>> 
>>>> If that's the case, then I think we need to disallow using the "." since
it
>>>> is used to access nested fields and can definitely cause ambiguity.
>>>> 
>>>> a clear case is where there is a data type with a field named "a.b" and
>>>> another field named "a" which has a nested field named "b".
>>>> 
>>>> Thoughts?
>>>> 
>>>> 
>>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>>> 
>>>>> I think there is no completely user-friendly way around this. Basically
>>>> our
>>>>> names allow ALL characters if they are incapsulated in quotes, so there
>>>>> isn't a character we can use that doesn't have the potential for
>>>> ambiguity
>>>>> from the user's perspective. This is why I had to change the nested stuff
>>>>> in indexing to be a list of strings rather than a single string.
>>>>> Steven
>>>>> 
>>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com> wrote:
>>>>> 
>>>>>> In this case, there could be ambiguity in the names.  Does it matter?
>>>>>> 
>>>>>> Chen
>>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>> wrote:
>>>>>> 
>>>>>>> Fieldnames do allow these characters (both of them).
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>>> 
>>>>>>>> I also prefer "." than "_".  Also want to confirm that field
names
>>>>>> don't
>>>>>>>> allow these two characters.
>>>>>>>> 
>>>>>>>> Chen
>>>>>>>> 
>>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I second Young-Seek (especially since this is the syntax
that
>>>> users
>>>>>>> will
>>>>>>>>> use themselves for nested information in queries).
>>>>>>>>> 
>>>>>>>>> Steven
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok Kim <
>>>>> kisskys@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> It seems better to use "." instead of "_" since "."
is more
>>>>>> intuitive
>>>>>>>> (at
>>>>>>>>>> least to me) than "_".
>>>>>>>>>> For example, the FacebookUserType_address will be
>>>>>>>>> FacebookUserType.address.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Young-Seok
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey <dtabass@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Much cleaner!  Others should weigh in here to
help finalize
>>>> the
>>>>>>>>>>> conventions....  Thoughts?
>>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov" <
>>>>> iabsa001@cs.ucr.edu
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> So the general solution is that the generated
names should
>>>>>> become
>>>>>>>>> less
>>>>>>>>>>>> verbose (consider previous examples):
>>>>>>>>>>>> 1) Anonymous fields naming scheme will change
to
>>>>> outerTypeName
>>>>>> +
>>>>>>>> “_”
>>>>>>>>> +
>>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>>>>> changed
>>>>>> to
>>>>>>>>>>>> “FacebookUserType_address”
>>>>>>>>>>>> 2) Anonymous collection item naming scheme
stays the same,
>>>>> i.e.
>>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>>>> to
>>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
>>>>> because
>>>>>>> the
>>>>>>>>>>>> anonymous field employment naming was changed
as described
>>>>>>> earlier)
>>>>>>>>>>>> 3) Union type completely seizes to exist
in metadata (it
>>>>> stays
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>> object model though), i.e.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>>>>>>>>>>>> is changed to
>>>>> “FacebookUserType_employment_ItemType_end-date”,
>>>>>>>> where
>>>>>>>>>> the
>>>>>>>>>>>> type metadata will have an additional field
“Optional” with
>>>>>> value
>>>>>>>>>> “true”.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar Absalyamov
<
>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So I have done half of the fix, which
is moved name
>>>>>> generation
>>>>>>>>> logic
>>>>>>>>>>> out
>>>>>>>>>>>> of the Metadata node to the client.
>>>>>>>>>>>>> Up to that point nothing in Metadata
format was changed,
>>>>>> which
>>>>>>>>> makes
>>>>>>>>>> me
>>>>>>>>>>>> wonder whether I should proceed with the
following changes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As it could be seen from the previous
email getting rid
>>>> of
>>>>>>>>>>>> union-inferred name generation would make
auto generated
>>>> type
>>>>>>> names
>>>>>>>>>> less
>>>>>>>>>>>> scary, but not entirely.
>>>>>>>>>>>>> Having in mind what Mike mentioned earlier
today, should
>>>> we
>>>>>> do
>>>>>>>>>>> something
>>>>>>>>>>>> about other auto generated type name cases?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 19, 2015, at 13:01, Ildar
Absalyamov <
>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Currently we are generating the names
for
>>>> inner\anonymous
>>>>>>> types
>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>>>> following cases:
>>>>>>>>>>>>>> 1) Anonymous field in the record.
>>>>>>>>>>>>>> AQL Example:
>>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>>      id: int32,
>>>>>>>>>>>>>>      name: string,
>>>>>>>>>>>>>>      address: {
>>>>>>>>>>>>>>           address_line: string,
>>>>>>>>>>>>>>           city: string
>>>>>>>>>>>>>>           state: string
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>> The pattern for generating an anonymous
field name is
>>>>>>> "Field_" +
>>>>>>>>>>>> fieldName + "_in_" + outerTypeName, which
translates to
>>>>>>>>>>>> "Field_address_in_FacebookUserType" in the
given example
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2) Anonymous collection (ordered\unordered
list) item
>>>>>>>>>>>>>> create type FacebookUserType as closed
{
>>>>>>>>>>>>>>      id: int32,
>>>>>>>>>>>>>>      name: string,
>>>>>>>>>>>>>>      employment: [{
>>>>>>>>>>>>>>           organization-name: string,
>>>>>>>>>>>>>>           start-date: date
>>>>>>>>>>>>>>           end-date: date?
>>>>>>>>>>>>>>   }]
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>> The pattern for generating an anonymous
collection item
>>>>> name
>>>>>>> is
>>>>>>>>>>>> collectionFieldName+_ItemType", which translates
to
>>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>>>> given
>>>>>>>> example
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3) Nullable fields
>>>>>>>>>>>>>> Same example as above could be used
(end-date field):
>>>> the
>>>>>>>> pattern
>>>>>>>>>> for
>>>>>>>>>>>> generating a nullable field name is "Type_#"
+
>>>>>>>>> fieldsNumberInUnoinList
>>>>>>>>>> +
>>>>>>>>>>>> "_UnionType_" + outerTypeName, which translates
to
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>>>>>>>>>>>> in the given example.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So you can see these auto generated
names could stack up
>>>>>>> pretty
>>>>>>>>> fast
>>>>>>>>>>>> and be completely incomprehensible. Just
to give you a
>>>> small
>>>>>>> flavor
>>>>>>>>> of
>>>>>>>>>>>> that, here is one of the metadata datasets
type
>>>> definitions:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> open {
>>>>>>>>>>>>>> DataverseName: STRING,
>>>>>>>>>>>>>> DatatypeName: STRING,
>>>>>>>>>>>>>> Derived: UNION(NULL, open {
>>>>>>>>>>>>>>    Tag: STRING,
>>>>>>>>>>>>>>    IsAnonymous: BOOLEAN,
>>>>>>>>>>>>>>    EnumValues: UNION(NULL, [ STRING
]),
>>>>>>>>>>>>>>    Record: UNION(NULL, open {
>>>>>>>>>>>>>>        IsOpen: BOOLEAN,
>>>>>>>>>>>>>>        Fields: [ open {
>>>>>>>>>>>>>>            FieldName: STRING,
>>>>>>>>>>>>>>            FieldType: STRING
>>>>>>>>>>>>>>          }
>>>>>>>>>>>>>>        ]
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>    ),
>>>>>>>>>>>>>>    Union: UNION(NULL, [ STRING ]),
>>>>>>>>>>>>>>    UnorderedList: UNION(NULL, STRING),
>>>>>>>>>>>>>>    OrderedList: UNION(NULL, STRING)
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>> ),
>>>>>>>>>>>>>> Timestamp: STRING
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And here are couple of fields names,
generated for it :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Ildar
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Amoudi, Abdullah.
>>>> 
>> 
> 
> Best regards,
> Ildar


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message