Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0DEEB200B29 for ; Thu, 16 Jun 2016 05:37:32 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0C993160A4D; Thu, 16 Jun 2016 03:37:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2CFCE160A57 for ; Thu, 16 Jun 2016 05:37:31 +0200 (CEST) Received: (qmail 44219 invoked by uid 500); 16 Jun 2016 03:37:29 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 44199 invoked by uid 99); 16 Jun 2016 03:37:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2016 03:37:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DA73BC02C8 for ; Thu, 16 Jun 2016 03:37:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=uci-edu.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id LrGkN8L0YuN1 for ; Thu, 16 Jun 2016 03:37:25 +0000 (UTC) Received: from mail-vk0-f53.google.com (mail-vk0-f53.google.com [209.85.213.53]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 254865F2F2 for ; Thu, 16 Jun 2016 03:37:25 +0000 (UTC) Received: by mail-vk0-f53.google.com with SMTP id u64so56620913vkf.3 for ; Wed, 15 Jun 2016 20:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uci-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=x0r9UQXPVSkFggF0NPZErkc5WYHDF/JpD3ZTfcxmqi4=; b=oDsEhyIbyPklKqYI63i+rShz0xPRXLY7xZhlufRZM/j+KO+s1LeZpmVn99bX2jj9Sl d/wO7keeR/Suz+XRwTOwPHIXQiz66KkZc66FGLMwyi5dbOYHVB0cQ/BHC1X9EpkeQugW hB9PwkNcYoG6e2l78kyc+MmWRP+x7srk1epU8UeXU97qRO787vsptmH8qxbNseUE27aD GK8bCQHKr+coC8kz4WUdOnLMbRtnG+t2r79Q5FavJZeU9Gls1GVMiaQihgtdrbU9xGvw +9xmKsHHS5g5pL9CQt+CNXvNpioiS4zK85MqavdmKFDQBJGSsCIcIc3YLcmnUW0+xp/m JUpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=x0r9UQXPVSkFggF0NPZErkc5WYHDF/JpD3ZTfcxmqi4=; b=lfIMlfWXHq7/m0zzLSZkr9BTNqAH7LABYKohjDgn5GbP712SCtRb3toI9OVu5a9eJ9 B1cippwRKEcEkpG5csK2oXE7MFI2y3nzQssxWqfVdSGv0lnnJRYwVnwJL7ZZFWW335XO dgFR9xIjvPcASg3GnvgrrEhoFlkOnuPbPeHdm65qNHUWMXWTCUa9AlttO3oXTQ3EjxJ8 8XUM2zIqf3AdxRiJ0FyWDwGXrW2x1jKLWENNTjk5MEWTz1elMVKbd3P4oMzj4aw753FG 8bHuBO+tnrl/DA44kHRR0jqsXGoIibMsztvEW/XioxftCiCaMy6Ij4WDAYJWpHcWtg98 AhoQ== X-Gm-Message-State: ALyK8tKwLwJbpKZVNFKUvsUHQqMlNXMojVpspuJaChnfbKkivsb1Kfmx/Ai30YjNLsPYA5HKw+sMUZsxQCGM7g== X-Received: by 10.176.64.37 with SMTP id h34mr1133063uad.112.1466048242643; Wed, 15 Jun 2016 20:37:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.64.2 with HTTP; Wed, 15 Jun 2016 20:37:03 -0700 (PDT) In-Reply-To: References: From: Ian Maxon Date: Wed, 15 Jun 2016 20:37:03 -0700 Message-ID: Subject: Re: The "real" ADM format To: dev@asterixdb.apache.org Content-Type: multipart/alternative; boundary=94eb2c122f92ba0b9e05355cf4d5 archived-at: Thu, 16 Jun 2016 03:37:32 -0000 --94eb2c122f92ba0b9e05355cf4d5 Content-Type: text/plain; charset=UTF-8 I think the int suffixes can be made to work, however there is sort of an issue with the suffixes for floats or doubles. First, the existing grammar doesn't deal with it at all for doubles, only floats. Second, "NaN" and "Infinity" are valid values for a double, but making those work with the suffix doesn't seem trivial to me. On Wed, Jun 15, 2016 at 3:52 PM, Ian Maxon wrote: > I've been looking at this a bit more, it turns out adm.grammar in > asterix-external-data is the "real" ADM format. It is suppose to > always accept suffixes of i8/16/32/etc after a digit sequence, but > something must be wrong with how the grammar is being translated. It > also appears that in some circumstances the parser can be coaxed into > taking the output. Therefore it seems to me at this time that the real > deficiency is in lexer-generator-maven-plugin and not elsewhere. > > On 6/8/16, Ian Maxon wrote: > > I guess I don't view the round-trippability in the same way then, all it > > means to me is that I can scan/output the data, load it, and end up with > > the same thing, not necessarily that I can load it without specifying the > > types and get them anyway because they're inlined to the data. I think if > > we want that the better thing to do would be to do something like > mysqldump > > (e.g. it dumps the metadata/types as an equivalent query basically). > Also, > > if we changed the format to conflict with the existing output of > SocialGen > > we'd have issues with current experiments and reproducing old results. > > > > On Wed, Jun 8, 2016 at 1:17 PM, Chris Hillery > > wrote: > > > >> I think the answer there is "round-tripability", right? ADM is meant to > >> exactly describe the data so that it can be reloaded in the same way it > >> was. Someone correct me if that isn't a requirement of the format... > >> > >> Ceej > >> On Jun 8, 2016 9:14 AM, "Ian Maxon" wrote: > >> > >> > Why should the type be intermingled with the data though when it isn't > >> > strictly necessary? For example why do I care if someone used an int64 > >> > to > >> > wrap something I know is actually a short integer, and so on. It also > >> kind > >> > of gets rid of the idea of ADM being a superset of JSON. > >> > > >> > On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman > >> > wrote: > >> > > >> > > The interval type format has been finalized and is the same for AQL > >> > > and ADM. Below is an example of the format: > >> > > > >> > > interval(date("01-01-2011"), date("02-02-2012")) > >> > > > >> > > The interval constructor now uses other data type constructors to > >> > > recreate an interval. The type of interval is defined by the two > >> > > matching arguments. > >> > > > >> > > > >> > > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery > > >> > > wrote: > >> > > > Ah, the other thing I forgot to mention is that I didn't include > >> > interval > >> > > > types, because I'm not sure about their current status. There was > >> some > >> > > > discussion on the list in January (subject "Round Tripping ADM > >> Interval > >> > > > Data") but I'm not sure where it ended up as far as the form of > the > >> > > > constructors, and whether that was AQL or ADM or both. > >> > > > > >> > > > Ceej > >> > > > aka Chris Hillery > >> > > > > >> > > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery > >> > > > >> > > >> > > wrote: > >> > > > > >> > > >> I started to create the current inventory of types, with the > forms > >> > > >> accepted / produced by the ADM parser, AQL parser, and ADM > >> > > serialization. > >> > > >> (I think we all agree that ADM parser and ADM serializer should > be > >> > 100% > >> > > >> compatible.) Here it is: > >> > > >> > >> > > >> > >> > > >> > >> > > > >> > > >> > https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing > >> > > >> > >> > > >> I know this is not comprehensive (for instance, I'm pretty sure > >> that a > >> > > >> naked integer will be parsed by both ADM and AQL as an int64, so > >> that > >> > > form > >> > > >> should be listed as an alternative) and I haven't verified that > >> > > >> the > >> > AQL > >> > > >> parser forms in particular are accurate, but I think it's close. > >> I've > >> > > set > >> > > >> it so anyone can edit that document, so please fill in the gaps > if > >> you > >> > > know > >> > > >> of any. > >> > > >> > >> > > >> We should also fill in the exact accepted forms for the various > >> > derived > >> > > >> types like the datetime, spatial, hex, and UUID types - eg., the > >> valid > >> > > >> forms of the double-quoted string in the duration() constructor > is > >> as > >> > > >> specified by XML schema, and so on. > >> > > >> > >> > > >> Ceej > >> > > >> aka Chris Hillery > >> > > >> > >> > > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery > >> > > >> >> > > >> > > >> wrote: > >> > > >> > >> > > >>> If it's possible, I think it would be least confusing if the > >> > serialized > >> > > >>> ADM format was identical to the corresponding data constructors > >> > > >>> in > >> > > AQL. It > >> > > >>> should be a goal IMHO that you can cut-and-paste an ADM file > into > >> the > >> > > query > >> > > >>> box in the web UI and the result would be the same as loading > the > >> > .adm. > >> > > >>> > >> > > >>> For more specifics, I think we need to write out for each data > >> > > >>> type > >> > > what > >> > > >>> the current ADM and AQL formats are, and then pick a final > answer > >> for > >> > > the > >> > > >>> type (which may possibly be different from either of the current > >> > forms, > >> > > >>> although I suspect not). That will he the spec, and we can > update > >> the > >> > > two > >> > > >>> parsers (and all the test cases) accordingly. > >> > > >>> > >> > > >>> I started an email thread sometime last year about something > >> > similar; I > >> > > >>> think it was about JSON serialization, but it at least had the > >> > > >>> AQL > >> > > side of > >> > > >>> this story for all simple types, I believe. > >> > > >>> > >> > > >>> Ceej > >> > > >>> aka Chris Hillery > >> > > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" wrote: > >> > > >>> > >> > > >>>> Hi all, > >> > > >>>> After my experience with having to fix a rather large ADM file > >> dump > >> > > from > >> > > >>>> a > >> > > >>>> query to make it load back into the system I was compelled to > >> > > >>>> try > >> my > >> > > hand > >> > > >>>> at making that not happen again. The first thing I tried my > hand > >> at > >> > > was > >> > > >>>> basically what I did to make the file loadable but inside the > >> > > >>>> type > >> > > >>>> printers; just remove all of the 'i32' and so on suffixes, as > >> > > >>>> well > >> > as > >> > > >>>> making decimals not formatted in scientific notation. This is > >> pretty > >> > > easy > >> > > >>>> to do as well, not a huge change code-wise (but obviously I'll > >> have > >> > to > >> > > >>>> fix > >> > > >>>> all of the tests). > >> > > >>>> > >> > > >>>> This got me to think though, which is the format that we > >> > > >>>> actually > >> > > want? > >> > > >>>> The > >> > > >>>> current format that is output, or the format that we accept in > >> > > >>>> the > >> > > >>>> loader? > >> > > >>>> Since this is actually perhaps a language level change either > >> > > >>>> way > >> I > >> > > >>>> figured > >> > > >>>> I should find consensus before spending more time on it. > >> > > >>>> > >> > > >>>> Thoughts/comments are appreciated. > >> > > >>>> > >> > > >>>> Thanks, > >> > > >>>> - Ian > >> > > >>>> > >> > > >>> > >> > > >> > >> > > > >> > > >> > > > --94eb2c122f92ba0b9e05355cf4d5--