asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Westmann <ti...@apache.org>
Subject Re: Metadata names generation
Date Thu, 25 Jun 2015 15:00:41 GMT
I’m sorry, I was considering the wrong domain when I wrote that e-mail. Indeed these characters
are not allowed in the JSON string serialization, but they can be part of the string value
as they can be escaped in the serialization ...

> On Jun 25, 2015, at 12:54 AM, Ildar Absalyamov <ildar.absalyamov@gmail.com> wrote:
> 
> I believe than the name of the field would be Field_b_in_Field_b_in_FooType, which is
still parsable if you start from beginning.
> 
> However, if JSON does not allow double quotes, field names within a generated type could
be wrapped in double quotes, which will resolved aforementioned ambiguity:
> FooType."b"."c" vs. FooType."b.c"
> On 24 Jun 2015, at 21:10, Till Westmann <tillw@apache.org <mailto:tillw@apache.org>>
wrote:
> 
>> And what happens if I call my type “Field_b_in_FooType”?
>> It seems that I still can create conflicts (albeit not that easily) …
>> 
>> BTW, I wasn’t completely precise. There are characters that are not allowed in
JSON strings: a double quote, a single backslash and any unicode control character.
>> So we could use those, but we would have to find a way to serialize them (reversibly)
when writing those names.
>> 
>> Cheers,
>> Till
>> 
>>> On Jun 24, 2015, at 6:43 PM, Ildar Absalyamov <ildar.absalyamov@gmail.com>
wrote:
>>> 
>>> OK, so when one is dealing with such names putting them in quotes will resolve
ambiguity.
>>> However auto generated type names are just strings, and they should be unique.
>>> Consider the example:
>>> FooType as open {
>>> “a”: string,
>>> “b”: { “c” : { “d”: string }},
>>> “b.c”: { “d”: string }
>>> }
>>> 
>>> The new naming scheme will generate these types: FooType.b, FooType.b.c and identical
FooType.b.c!
>>> Whereas the old naming will produce Field_b_in_FooType, Field_c_in_Field_b_in_FooType
and Field_b.c_in_FooType, thus resolving the name conflict.
>>> 
>>> So it seems type name verbosity was there for a reason?
>>> 
>>>> On Jun 24, 2015, at 18:27, Till Westmann <tillw@apache.org <mailto:tillw@apache.org>
<mailto:tillw@apache.org <mailto:tillw@apache.org>>> wrote:
>>>> 
>>>> 
>>>>> On Jun 24, 2015, at 3:16 PM, Steven Jacobs <sjaco002@ucr.edu <mailto:sjaco002@ucr.edu>>
wrote:
>>>>> 
>>>>> a clear case is where there is a data type with a field named "a.b" and
>>>>> another field named "a" which has a nested field named "b".
>>>>> 
>>>>> This is allowed right now. You would have to access the first as "a.b"
and
>>>>> the second as a.b. The quotes basically tell the parser "this is a single
>>>>> name with whatever characters I want in it.”
>>>> 
>>>> a.b is mainly a convenient shortcut for “a”.”b"
>>>> 
>>>>> To me it seems fine to
>>>>> disallow some characters, but back when I had discussions about this
with
>>>>> Vinayak, Mike, and Till, Till was arguing against disallowing characters.
I
>>>>> can't really remember his reasons now though.
>>>>> 
>>>>> @Till, what are your thoughts on this?
>>>> 
>>>> All characters are allowed for field names in JSON (http://json.org <http://json.org/>
<http://json.org/ <http://json.org/>> <http://json.org/ <http://json.org/>
<http://json.org/ <http://json.org/>>>).
>>>> So if disallow some characters, we will need to map names that contain them
so something else (or not allow such JSON documents).
>>>> It seems that that will get messy and/or painful pretty quickly.
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>>> On Wed, Jun 24, 2015 at 11:56 AM, abdullah alamoudi <bamousaa@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> If that's the case, then I think we need to disallow using the "."
since it
>>>>>> is used to access nested fields and can definitely cause ambiguity.
>>>>>> 
>>>>>> a clear case is where there is a data type with a field named "a.b"
and
>>>>>> another field named "a" which has a nested field named "b".
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 9:51 PM, Steven Jacobs <sjaco002@ucr.edu>
wrote:
>>>>>> 
>>>>>>> I think there is no completely user-friendly way around this.
Basically
>>>>>> our
>>>>>>> names allow ALL characters if they are incapsulated in quotes,
so there
>>>>>>> isn't a character we can use that doesn't have the potential
for
>>>>>> ambiguity
>>>>>>> from the user's perspective. This is why I had to change the
nested stuff
>>>>>>> in indexing to be a list of strings rather than a single string.
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Wed, Jun 24, 2015 at 11:43 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>>> 
>>>>>>>> In this case, there could be ambiguity in the names.  Does
it matter?
>>>>>>>> 
>>>>>>>> Chen
>>>>>>>> 
>>>>>>>> On Wed, Jun 24, 2015 at 11:17 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Fieldnames do allow these characters (both of them).
>>>>>>>>> Steven
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 24, 2015 at 11:15 AM, Chen Li <chenli@gmail.com>
wrote:
>>>>>>>>> 
>>>>>>>>>> I also prefer "." than "_".  Also want to confirm
that field names
>>>>>>>> don't
>>>>>>>>>> allow these two characters.
>>>>>>>>>> 
>>>>>>>>>> Chen
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 24, 2015 at 10:52 AM, Steven Jacobs <sjaco002@ucr.edu>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I second Young-Seek (especially since this is
the syntax that
>>>>>> users
>>>>>>>>> will
>>>>>>>>>>> use themselves for nested information in queries).
>>>>>>>>>>> 
>>>>>>>>>>> Steven
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 24, 2015 at 10:40 AM, Young-Seok
Kim <
>>>>>>> kisskys@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It seems better to use "." instead of "_"
since "." is more
>>>>>>>> intuitive
>>>>>>>>>> (at
>>>>>>>>>>>> least to me) than "_".
>>>>>>>>>>>> For example, the FacebookUserType_address
will be
>>>>>>>>>>> FacebookUserType.address.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Young-Seok
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 24, 2015 at 6:31 AM, Mike Carey
<dtabass@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Much cleaner!  Others should weigh in
here to help finalize
>>>>>> the
>>>>>>>>>>>>> conventions....  Thoughts?
>>>>>>>>>>>>> On Jun 23, 2015 5:31 PM, "Ildar Absalyamov"
<
>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So the general solution is that the
generated names should
>>>>>>>> become
>>>>>>>>>>> less
>>>>>>>>>>>>>> verbose (consider previous examples):
>>>>>>>>>>>>>> 1) Anonymous fields naming scheme
will change to
>>>>>>> outerTypeName
>>>>>>>> +
>>>>>>>>>> “_”
>>>>>>>>>>> +
>>>>>>>>>>>>>> fieldName, i.e. “Field_address_in_FacebookUserType”
is
>>>>>>> changed
>>>>>>>> to
>>>>>>>>>>>>>> “FacebookUserType_address”
>>>>>>>>>>>>>> 2) Anonymous collection item naming
scheme stays the same,
>>>>>>> i.e.
>>>>>>>>>>>>>> “Field_employment_in_FacebookUserType_ItemType”
is changed
>>>>>> to
>>>>>>>>>>>>>> “FacebookUserType_employment_ItemType”
(name is changed
>>>>>>> because
>>>>>>>>> the
>>>>>>>>>>>>>> anonymous field employment naming
was changed as described
>>>>>>>>> earlier)
>>>>>>>>>>>>>> 3) Union type completely seizes to
exist in metadata (it
>>>>>>> stays
>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>>>>>> object model though), i.e.
>>>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType”
>>>>>>>>>>>>>> is changed to
>>>>>>> “FacebookUserType_employment_ItemType_end-date”,
>>>>>>>>>> where
>>>>>>>>>>>> the
>>>>>>>>>>>>>> type metadata will have an additional
field “Optional” with
>>>>>>>> value
>>>>>>>>>>>> “true”.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jun 19, 2015, at 18:11, Ildar
Absalyamov <
>>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So I have done half of the fix,
which is moved name
>>>>>>>> generation
>>>>>>>>>>> logic
>>>>>>>>>>>>> out
>>>>>>>>>>>>>> of the Metadata node to the client.
>>>>>>>>>>>>>>> Up to that point nothing in Metadata
format was changed,
>>>>>>>> which
>>>>>>>>>>> makes
>>>>>>>>>>>> me
>>>>>>>>>>>>>> wonder whether I should proceed with
the following changes.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As it could be seen from the
previous email getting rid
>>>>>> of
>>>>>>>>>>>>>> union-inferred name generation would
make auto generated
>>>>>> type
>>>>>>>>> names
>>>>>>>>>>>> less
>>>>>>>>>>>>>> scary, but not entirely.
>>>>>>>>>>>>>>> Having in mind what Mike mentioned
earlier today, should
>>>>>> we
>>>>>>>> do
>>>>>>>>>>>>> something
>>>>>>>>>>>>>> about other auto generated type name
cases?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jun 19, 2015, at 13:01,
Ildar Absalyamov <
>>>>>>>>>> iabsa001@cs.ucr.edu
>>>>>>>>>>>>>> <mailto:iabsa001@cs.ucr.edu>>
wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Currently we are generating
the names for
>>>>>> inner\anonymous
>>>>>>>>> types
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>> following cases:
>>>>>>>>>>>>>>>> 1) Anonymous field in the
record.
>>>>>>>>>>>>>>>> AQL Example:
>>>>>>>>>>>>>>>> create type FacebookUserType
as closed {
>>>>>>>>>>>>>>>>    id: int32,
>>>>>>>>>>>>>>>>    name: string,
>>>>>>>>>>>>>>>>    address: {
>>>>>>>>>>>>>>>>         address_line: string,
>>>>>>>>>>>>>>>>         city: string
>>>>>>>>>>>>>>>>         state: string
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> The pattern for generating
an anonymous field name is
>>>>>>>>> "Field_" +
>>>>>>>>>>>>>> fieldName + "_in_" + outerTypeName,
which translates to
>>>>>>>>>>>>>> "Field_address_in_FacebookUserType"
in the given example
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2) Anonymous collection (ordered\unordered
list) item
>>>>>>>>>>>>>>>> create type FacebookUserType
as closed {
>>>>>>>>>>>>>>>>    id: int32,
>>>>>>>>>>>>>>>>    name: string,
>>>>>>>>>>>>>>>>    employment: [{
>>>>>>>>>>>>>>>>         organization-name:
string,
>>>>>>>>>>>>>>>>         start-date: date
>>>>>>>>>>>>>>>>         end-date: date?
>>>>>>>>>>>>>>>> }]
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> The pattern for generating
an anonymous collection item
>>>>>>> name
>>>>>>>>> is
>>>>>>>>>>>>>> collectionFieldName+_ItemType", which
translates to
>>>>>>>>>>>>>> "Field_employment_in_FacebookUserType_ItemType"
in the
>>>>>> given
>>>>>>>>>> example
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 3) Nullable fields
>>>>>>>>>>>>>>>> Same example as above could
be used (end-date field):
>>>>>> the
>>>>>>>>>> pattern
>>>>>>>>>>>> for
>>>>>>>>>>>>>> generating a nullable field name
is "Type_#" +
>>>>>>>>>>> fieldsNumberInUnoinList
>>>>>>>>>>>> +
>>>>>>>>>>>>>> "_UnionType_" + outerTypeName, which
translates to
>>>>>> “Type_#1_UnionType_Field_end-date_in_Field_employment_in_FacebookUserType_ItemType"
>>>>>>>>>>>>>> in the given example.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So you can see these auto
generated names could stack up
>>>>>>>>> pretty
>>>>>>>>>>> fast
>>>>>>>>>>>>>> and be completely incomprehensible.
Just to give you a
>>>>>> small
>>>>>>>>> flavor
>>>>>>>>>>> of
>>>>>>>>>>>>>> that, here is one of the metadata
datasets type
>>>>>> definitions:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> open {
>>>>>>>>>>>>>>>> DataverseName: STRING,
>>>>>>>>>>>>>>>> DatatypeName: STRING,
>>>>>>>>>>>>>>>> Derived: UNION(NULL, open
{
>>>>>>>>>>>>>>>>  Tag: STRING,
>>>>>>>>>>>>>>>>  IsAnonymous: BOOLEAN,
>>>>>>>>>>>>>>>>  EnumValues: UNION(NULL,
[ STRING ]),
>>>>>>>>>>>>>>>>  Record: UNION(NULL, open
{
>>>>>>>>>>>>>>>>      IsOpen: BOOLEAN,
>>>>>>>>>>>>>>>>      Fields: [ open {
>>>>>>>>>>>>>>>>          FieldName: STRING,
>>>>>>>>>>>>>>>>          FieldType: STRING
>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>      ]
>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>  ),
>>>>>>>>>>>>>>>>  Union: UNION(NULL, [ STRING
]),
>>>>>>>>>>>>>>>>  UnorderedList: UNION(NULL,
STRING),
>>>>>>>>>>>>>>>>  OrderedList: UNION(NULL,
STRING)
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> ),
>>>>>>>>>>>>>>>> Timestamp: STRING
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> And here are couple of fields
names, generated for it :)
>>>>>> Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>> Field_UnorderedList_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType
>>>>>> Field_Fields_in_Type_#1_UnionType_Field_Record_in_Type_#1_UnionType_Field_Derived_in_DatatypeRecordType_ItemType
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Ildar
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Ildar
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Amoudi, Abdullah.
>>> 
>>> Best regards,
>>> Ildar


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message