asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: The "real" ADM format
Date Wed, 15 Jun 2016 22:52:24 GMT
I've been looking at this a bit more, it turns out adm.grammar in
asterix-external-data is the "real" ADM format. It is suppose to
always accept suffixes of i8/16/32/etc after a digit sequence, but
something must be wrong with how the grammar is being translated. It
also appears that in some circumstances the parser can be coaxed into
taking the output. Therefore it seems to me at this time that the real
deficiency is in lexer-generator-maven-plugin and not elsewhere.

On 6/8/16, Ian Maxon <imaxon@uci.edu> wrote:
> I guess I don't view the round-trippability in the same way then, all it
> means to me is that I can scan/output the data, load it, and end up with
> the same thing, not necessarily that I can load it without specifying the
> types and get them anyway because they're inlined to the data. I think if
> we want that the better thing to do would be to do something like mysqldump
> (e.g. it dumps the metadata/types as an equivalent query basically). Also,
> if we changed the format to conflict with the existing output of SocialGen
> we'd have issues with current experiments and reproducing old results.
>
> On Wed, Jun 8, 2016 at 1:17 PM, Chris Hillery <chillery@hillery.land>
> wrote:
>
>> I think the answer there is "round-tripability", right? ADM is meant to
>> exactly describe the data so that it can be reloaded in the same way it
>> was. Someone correct me if that isn't a requirement of the format...
>>
>> Ceej
>> On Jun 8, 2016 9:14 AM, "Ian Maxon" <imaxon@uci.edu> wrote:
>>
>> > Why should the type be intermingled with the data though when it isn't
>> > strictly necessary? For example why do I care if someone used an int64
>> > to
>> > wrap something I know is actually a short integer, and so on. It also
>> kind
>> > of gets rid of the idea of ADM being a superset of JSON.
>> >
>> > On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <prestonc@apache.org>
>> > wrote:
>> >
>> > > The interval type format has been finalized and is the same for AQL
>> > > and ADM. Below is an example of the format:
>> > >
>> > > interval(date("01-01-2011"), date("02-02-2012"))
>> > >
>> > > The interval constructor now uses other data type constructors to
>> > > recreate an interval. The type of interval is defined by the two
>> > > matching arguments.
>> > >
>> > >
>> > > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <chillery@hillery.land>
>> > > wrote:
>> > > > Ah, the other thing I forgot to mention is that I didn't include
>> > interval
>> > > > types, because I'm not sure about their current status. There was
>> some
>> > > > discussion on the list in January (subject "Round Tripping ADM
>> Interval
>> > > > Data") but I'm not sure where it ended up as far as the form of the
>> > > > constructors, and whether that was AQL or ADM or both.
>> > > >
>> > > > Ceej
>> > > > aka Chris Hillery
>> > > >
>> > > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery
>> > > > <chillery@hillery.land
>> >
>> > > wrote:
>> > > >
>> > > >> I started to create the current inventory of types, with the forms
>> > > >> accepted / produced by the ADM parser, AQL parser, and ADM
>> > > serialization.
>> > > >> (I think we all agree that ADM parser and ADM serializer should
be
>> > 100%
>> > > >> compatible.) Here it is:
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
>> > > >>
>> > > >> I know this is not comprehensive (for instance, I'm pretty sure
>> that a
>> > > >> naked integer will be parsed by both ADM and AQL as an int64,
so
>> that
>> > > form
>> > > >> should be listed as an alternative) and I haven't verified that
>> > > >> the
>> > AQL
>> > > >> parser forms in particular are accurate, but I think it's close.
>> I've
>> > > set
>> > > >> it so anyone can edit that document, so please fill in the gaps
if
>> you
>> > > know
>> > > >> of any.
>> > > >>
>> > > >> We should also fill in the exact accepted forms for the various
>> > derived
>> > > >> types like the datetime, spatial, hex, and UUID types - eg., the
>> valid
>> > > >> forms of the double-quoted string in the duration() constructor
is
>> as
>> > > >> specified by XML schema, and so on.
>> > > >>
>> > > >> Ceej
>> > > >> aka Chris Hillery
>> > > >>
>> > > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery
>> > > >> <chillery@hillery.land
>> >
>> > > >> wrote:
>> > > >>
>> > > >>> If it's possible, I think it would be least confusing if the
>> > serialized
>> > > >>> ADM format was identical to the corresponding data constructors
>> > > >>> in
>> > > AQL. It
>> > > >>> should be a goal IMHO that you can cut-and-paste an ADM file
into
>> the
>> > > query
>> > > >>> box in the web UI and the result would be the same as loading
the
>> > .adm.
>> > > >>>
>> > > >>> For more specifics, I think we need to write out for each
data
>> > > >>> type
>> > > what
>> > > >>> the current ADM and AQL formats are, and then pick a final
answer
>> for
>> > > the
>> > > >>> type (which may possibly be different from either of the current
>> > forms,
>> > > >>> although I suspect not). That will he the spec, and we can
update
>> the
>> > > two
>> > > >>> parsers (and all the test cases) accordingly.
>> > > >>>
>> > > >>> I started an email thread sometime last year about something
>> > similar; I
>> > > >>> think it was about JSON serialization, but it at least had
the
>> > > >>> AQL
>> > > side of
>> > > >>> this story for all simple types, I believe.
>> > > >>>
>> > > >>> Ceej
>> > > >>> aka Chris Hillery
>> > > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <imaxon@uci.edu>
wrote:
>> > > >>>
>> > > >>>> Hi all,
>> > > >>>> After my experience with having to fix a rather large
ADM file
>> dump
>> > > from
>> > > >>>> a
>> > > >>>> query to make it load back into the system I was compelled
to
>> > > >>>> try
>> my
>> > > hand
>> > > >>>> at making that not happen again. The first thing I tried
my hand
>> at
>> > > was
>> > > >>>> basically what I did to make the file loadable but inside
the
>> > > >>>> type
>> > > >>>> printers; just remove all of the 'i32' and so on suffixes,
as
>> > > >>>> well
>> > as
>> > > >>>> making decimals not formatted in scientific notation.
This is
>> pretty
>> > > easy
>> > > >>>> to do as well, not a huge change code-wise (but obviously
I'll
>> have
>> > to
>> > > >>>> fix
>> > > >>>> all of the tests).
>> > > >>>>
>> > > >>>> This got me to think though, which is the format that
we
>> > > >>>> actually
>> > > want?
>> > > >>>> The
>> > > >>>> current format that is output, or the format that we accept
in
>> > > >>>> the
>> > > >>>> loader?
>> > > >>>> Since this is actually perhaps a language level change
either
>> > > >>>> way
>> I
>> > > >>>> figured
>> > > >>>> I should find consensus before spending more time on it.
>> > > >>>>
>> > > >>>> Thoughts/comments are appreciated.
>> > > >>>>
>> > > >>>> Thanks,
>> > > >>>> - Ian
>> > > >>>>
>> > > >>>
>> > > >>
>> > >
>> >
>>
>

Mime
View raw message