asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Round Tripping ADM Interval Data
Date Mon, 01 Feb 2016 08:10:53 GMT
+1

On 31 Jan 2016, at 16:41, Mike Carey wrote:

> Removal seems fine/cleaner to me...!
>
> On 1/31/16 10:19 AM, Eldon Carman wrote:
>> I submitted a code review for the external (phase 1) changes. The 
>> change
>> includes a interval constructor that takes two arguments of the same 
>> type.
>> For example:
>>
>> let $v1 := interval(date("2013-01-01"), date("20130505"))
>> let $v2 := interval(time("00:01:01"), time("213901049+0800"))
>> let $v3 := interval(datetime("2013-01-01T00:01:01"),
>> datetime("20130505T213901049+0800"))
>> return { "v1": $v1, "v2": $v2, "v3": $v3 }
>>
>>
>> Do we want to keep the previous functions for declaring an interval? 
>> The
>> these functions are unique for each interval type and support string
>> arguments in addition to the specific type (date, time, datetime).
>>
>> let $v1 := interval-from-date(date("2013-01-01"), date("20130505"))
>> let $v2 := interval-from-time(time("00:01:01"), 
>> time("213901049+0800"))
>> let $v3 := interval-from-datetime(datetime("2013-01-01T00:01:01"),
>> datetime("20130505T213901049+0800"))
>> return { "v1": $v1, "v2": $v2, "v3": $v3 }
>>
>> They could be helpful, but I am don't think they are necessary. 
>> Removing
>> them would consolidate the code path for intervals. Thoughts?
>>
>>
>> On Fri, Jan 29, 2016 at 12:15 AM, Mike Carey <dtabass@gmail.com> 
>> wrote:
>>> +1 on all points below!  :-)
>>>
>>>
>>> On 1/28/16 11:00 PM, Till Westmann wrote:
>>>> Sounds good to me. It will change the binary representation of
>> intervals, so it’s not backwards compatible.
>>>> But it seems that the tag-first representation is the way it should 
>>>> have
>> been anyway and it’s a better way to go forward to support more 
>> generic
>> intervals. (And I prefer layout changes rather now than later ..)
>>>> Cheers,
>>>> Till
>>>>
>>>> On 28 Jan 2016, at 21:46, Eldon Carman wrote:
>>>>
>>>>> Based on the discussion, I have suggestion for a two phase 
>>>>> approach to
>>>>> support generic intervals.
>>>>>
>>>>>
>>>>> Phase 1: External
>>>>> The first phase will focus on areas seen by a user, either though 
>>>>> ADM or
>>>>> AQL. The new format will also be open to supporting other interval
>> types,
>>>>> although currently will only support date, time and datetime. Here 
>>>>> are a
>>>>> few of the action items:
>>>>>
>>>>> - ADM printer and parser changed to support the new generic style 
>>>>> format
>>>>> - interval(date("2012-01-01”), date(”2013-04-01”))
>>>>> - Add an interval AQL constructor to support the above format
>>>>> - Alter interval byte structure to support any interval type
>>>>> - byte tag, T start, T end
>>>>> - where T is currently only date, time and datetime
>>>>>
>>>>> I created a ticket to track status for Phase 1:
>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1281
>>>>>
>>>>>
>>>>> Phase 2: Internal
>>>>> Phase two will focus on items under the hood. These items are only 
>>>>> seen
>> by
>>>>> a developer and need to be update to support new interval types. 
>>>>> The
>>>>> complete task list will require some investigation. In addition to
>> picking
>>>>> what types of intervals should be supported.
>>>>>
>>>>> Phase 1 can start immediately while Phase 2 can wait until needed.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 26, 2016 at 2:14 PM, Eldon Carman <ecarm002@ucr.edu>

>>>>> wrote:
>>>>>
>>>>>> While its a little more work to implement up front, the following
>> format
>>>>>> would be generic and support alternate interval types in the 
>>>>>> future. It
>>>>>> would be nice to have consistency between AQL and ADM, although 
>>>>>> not
>>>>>> required.
>>>>>>
>>>>>> interval(date("2012-01-01”), date(”2013-04-01”))
>>>>>>
>>>>>> As we move to supporting generic intervals, the byte storage 
>>>>>> format
>> will
>>>>>> need to be updated. Currently an interval is represented by: 
>>>>>> start
>> (long),
>>>>>> end (long), type tag (byte). To support other types, the type tag
>> should be
>>>>>> at the beginning of the byte sequence. This way the tag can be 
>>>>>> used to
>>>>>> determine the data length of each item in the interval.
>>>>>>
>>>>>> Should the changes to AQL and ADM include this interval storage 
>>>>>> change
>>>>>> (moving the type tag to the first byte of the interval storage 
>>>>>> format)?
>>>>>>
>>>>>> On Tue, Jan 26, 2016 at 12:42 PM, Till Westmann 
>>>>>> <tillw@apache.org>
>> wrote:
>>>>>>> That’s actually a nice and generic serialization.
>>>>>>> I think that we should do this similarly in ADM and AQL.
>>>>>>> I.e. instead of using
>>>>>>>
>>>>>>> interval-from-date("2012-01-01”, ”2013-04-01”)
>>>>>>>
>>>>>>> (note the two parameters) in AQL and
>>>>>>>
>>>>>>> interval-date("2012-01-01, 2013-04-01")
>>>>>>>
>>>>>>> (not the single parameter) in ADM we should use
>>>>>>>
>>>>>>> interval(date("2012-01-01”), date(”2013-04-01”))
>>>>>>>
>>>>>>> for both. That would have a number of advantages:
>>>>>>>
>>>>>>> 1) It is consistent between AQL and ADM.
>>>>>>> 2) It is consistent with the JSON serialization.
>>>>>>> 3) It reduces the number of magic parsers.
>>>>>>> 4) It keeps the interval orthogonal to the type used in the 
>>>>>>> interval.
>>>>>>>
>>>>>>> On 4): While we don’t support intervals of other types than

>>>>>>> date,
>> time,
>>>>>>> and datetime so far, I think that we should change that and so

>>>>>>> this
>> would
>>>>>>> be a good step in that direction as well.
>>>>>>>
>>>>>>> The disadvantages are
>>>>>>>
>>>>>>> 1) Incompatible AQL change
>>>>>>> 2) Incompatible ADM change
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>>
>>>>>>> On 26 Jan 2016, at 11:46, Eldon Carman wrote:
>>>>>>>
>>>>>>> I found that the lossless-JSON and clean-JSON printers were not

>>>>>>> being
>>>>>>>> used.
>>>>>>>> After connecting them to the respective JSON printer, I ran
the 
>>>>>>>> query
>>>>>>>> again.
>>>>>>>>
>>>>>>>> lossless-JSON result:
>>>>>>>> { "orderedlist": [ { "date-interval": { "interval": { "start":

>>>>>>>> {
>> "date":
>>>>>>>> "2012-01-01" }, "end": { "date": "2013-04-01" }}} }, {
>> "time-interval": {
>>>>>>>> "interval": { "start": { "time": "12:23:34.456Z" }, "end":
{ 
>>>>>>>> "time":
>>>>>>>> "15:34:45.567Z" }}} }, { "datetime-interval": { "interval":
{
>> "start": {
>>>>>>>> "datetime": "2012-01-01T04:23:34.456Z" }, "end": { "datetime":
>>>>>>>> "2013-04-01T15:34:45.567Z" }}} } ] }
>>>>>>>>
>>>>>>>>
>>>>>>>> clean-JSON result:
>>>>>>>> [ { "date-interval": { "interval": { "start": "2012-01-01",

>>>>>>>> "end":
>>>>>>>> "2013-04-01"}} }, { "time-interval": { "interval": { "start":
>>>>>>>> "12:23:34.456Z", "end": "15:34:45.567Z"}} }, { 
>>>>>>>> "datetime-interval": {
>>>>>>>> "interval": { "start": "2012-01-01T04:23:34.456Z", "end":
>>>>>>>> "2013-04-01T15:34:45.567Z"}} } ]
>>>>>>>>
>>>>>>>> Is this what you would have expected?
>>>>>>>>
>>>>>>>> On Mon, Jan 25, 2016 at 7:07 PM, Eldon Carman 
>>>>>>>> <ecarm002@ucr.edu>
>> wrote:
>>>>>>>>> Thanks Chris for adding a fourth option. This option
would 
>>>>>>>>> focus our
>>>>>>>>>
>>>>>>>> updates to only the ADM output.
>>>>>>>>
>>>>>>>>> Yes, both lossless-JSON and clean-JSON outputs would
need to 
>>>>>>>>> be
>> check
>>>>>>>> also.
>>>>>>>>
>>>>>>>>> On Mon, Jan 25, 2016 at 5:58 PM, Chris Hillery
>> <chillery@hillery.land>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>> I would vote for:
>>>>>>>>>>
>>>>>>>>>> d. Update the serialized format to output 
>>>>>>>>>> "interval-from-date" and
>> put
>>>>>>>>> both
>>>>>>>>
>>>>>>>>> dates in quotes.
>>>>>>>>>>
>>>>>>>>>> I like the function name interval-from-date() better,
and I 
>>>>>>>>>> don't
>> think
>>>>>>>>>> there's any need to maintain backwards compatibility
with the 
>>>>>>>>>> old
>> name
>>>>>>>>>> which clearly never worked.
>>>>>>>>>>
>>>>>>>>>> Couple thoughts, though: The serialized format really
should 
>>>>>>>>>> be
>> "ADM",
>>>>>>>>> not
>>>>>>>>
>>>>>>>>> "AQL". As such I don't think it should reference functions
at 
>>>>>>>>> all.
>> We
>>>>>>>>>> already do this for many datatypes, such as uuid("...")
and
>>>>>>>>>> datetime("..."). Are those truly "Functions"? Are
they
>> "constructors",
>>>>>>>>> and
>>>>>>>>
>>>>>>>>> is that different? In any case, the answer for interval
types
>> should be
>>>>>>>>>> consistent with that.
>>>>>>>>>>
>>>>>>>>>> Final note: quite possibly the lossless-JSON and
clean-JSON
>> outputs for
>>>>>>>>>> intervals are broken as well, and should be fixed.
>>>>>>>>>>
>>>>>>>>>> Ceej
>>>>>>>>>> aka Chris Hillery
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 25, 2016 at 5:36 PM, Till Westmann 
>>>>>>>>>> <tillw@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Voting for a. Seems to be the least redundant option.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Till
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 25 Jan 2016, at 16:47, Eldon Carman wrote:
>>>>>>>>>>>
>>>>>>>>>>> The interval field value printed in the ADM results
can not 
>>>>>>>>>>> be
>> used to
>>>>>>>>>>>> create an interval.
>>>>>>>>>>>>
>>>>>>>>>>>> Intervals have several functions that are
used to construct 
>>>>>>>>>>>> an
>>>>>>>>>>>>
>>>>>>>>>>> interval:
>>>>>>>>
>>>>>>>>> interval-from-date/time/datetime
>>>>>>>>>>>> and interval-start-from-date/time/datetime.
It appears that 
>>>>>>>>>>>> this
>> is
>>>>>>>>>>> the
>>>>>>>>
>>>>>>>>> only way to create an interval. Thus, a user must use
one of 
>>>>>>>>> these
>>>>>>>>>>>> function
>>>>>>>>>>>> to create an interval.
>>>>>>>>>>>>
>>>>>>>>>>>> The following query shows how to create three
intervals.
>>>>>>>>>>>>
>>>>>>>>>>>> Query:
>>>>>>>>>>>> let $di := {"date-interval": 
>>>>>>>>>>>> interval-from-date("2012-01-01",
>>>>>>>>>>>> "2013-04-01")}
>>>>>>>>>>>> let $ti := {"time-interval": 
>>>>>>>>>>>> interval-from-time("12:23:34.456Z",
>>>>>>>>>>>> "233445567+0800")}
>>>>>>>>>>>> let $dti := {"datetime-interval":
>>>>>>>>>>>> interval-from-datetime("2012-01-01T12:23:34.456+08:00",
>>>>>>>>>>>> "20130401T153445567Z")}
>>>>>>>>>>>> return [$di, $ti, $dti];
>>>>>>>>>>>>
>>>>>>>>>>>> Result:
>>>>>>>>>>>> { "date-interval": interval-date("2012-01-01,
2013-04-01") 
>>>>>>>>>>>> }, {
>>>>>>>>>>>> "time-interval": interval-time("12:23:34.456Z,

>>>>>>>>>>>> 15:34:45.567Z")
>> }, {
>>>>>>>>>>>> "datetime-interval": 
>>>>>>>>>>>> interval-datetime("2012-01-01T04:23:34.456Z,
>>>>>>>>>>>> 2013-04-01T15:34:45.567Z") } ]
>>>>>>>>>>>>
>>>>>>>>>>>> Notice the results show interval-date("date,
date") which 
>>>>>>>>>>>> is
>>>>>>>>>>>> different
>>>>>>>>>>>> than
>>>>>>>>>>>> the functions that are used to create a date
interval. 
>>>>>>>>>>>> Notice
>> that
>>>>>>>>>>>> interval-date does not exists in AsterixDB
and that the 
>>>>>>>>>>>> input is
>> a
>>>>>>>>>>> single
>>>>>>>>
>>>>>>>>> string of dates separated by a comma. Below are some
ideas on 
>>>>>>>>> how to
>>>>>>>>>>>> create
>>>>>>>>>>>> a round-trip for intervals.
>>>>>>>>>>>>
>>>>>>>>>>>> Options for round tripping:
>>>>>>>>>>>> a: Rename "interval-from-date" to "interval-date"
and 
>>>>>>>>>>>> update the
>>>>>>>>>>>>
>>>>>>>>>>> output to
>>>>>>>>
>>>>>>>>> put both dates in quotes.
>>>>>>>>>>>> b: Add alias for "interval-from-date" to
"interval-date" 
>>>>>>>>>>>> and
>> update
>>>>>>>>>>> the
>>>>>>>>
>>>>>>>>> output to put both dates in quotes.
>>>>>>>>>>>> c: Create an interval date constructor (called

>>>>>>>>>>>> interval-date)
>> that
>>>>>>>>>>>> can
>>>>>>>>>>>> parse the string "date, date".
>>>>>>>>>>>>
>>>>>>>>>>>> The same process should be used for intervals
with time and
>> datetime.
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>

Mime
View raw message