asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Round Tripping ADM Interval Data
Date Fri, 29 Jan 2016 07:00:18 GMT
Sounds good to me. It will change the binary representation of 
intervals, so it’s not backwards compatible.
But it seems that the tag-first representation is the way it should have 
been anyway and it’s a better way to go forward to support more 
generic intervals. (And I prefer layout changes rather now than later 
..)

Cheers,
Till

On 28 Jan 2016, at 21:46, Eldon Carman wrote:

> Based on the discussion, I have suggestion for a two phase approach to
> support generic intervals.
>
>
> Phase 1: External
> The first phase will focus on areas seen by a user, either though ADM 
> or
> AQL. The new format will also be open to supporting other interval 
> types,
> although currently will only support date, time and datetime. Here are 
> a
> few of the action items:
>
> - ADM printer and parser changed to support the new generic style 
> format
>  - interval(date("2012-01-01”), date(”2013-04-01”))
> - Add an interval AQL constructor to support the above format
> - Alter interval byte structure to support any interval type
> - byte tag, T start, T end
> - where T is currently only date, time and datetime
>
> I created a ticket to track status for Phase 1:
> https://issues.apache.org/jira/browse/ASTERIXDB-1281
>
>
> Phase 2: Internal
> Phase two will focus on items under the hood. These items are only 
> seen by
> a developer and need to be update to support new interval types. The
> complete task list will require some investigation. In addition to 
> picking
> what types of intervals should be supported.
>
> Phase 1 can start immediately while Phase 2 can wait until needed.
>
> Thoughts?
>
>
>
> On Tue, Jan 26, 2016 at 2:14 PM, Eldon Carman <ecarm002@ucr.edu> 
> wrote:
>
>> While its a little more work to implement up front, the following 
>> format
>> would be generic and support alternate interval types in the future. 
>> It
>> would be nice to have consistency between AQL and ADM, although not
>> required.
>>
>> interval(date("2012-01-01”), date(”2013-04-01”))
>>
>> As we move to supporting generic intervals, the byte storage format 
>> will
>> need to be updated. Currently an interval is represented by: start 
>> (long),
>> end (long), type tag (byte). To support other types, the type tag 
>> should be
>> at the beginning of the byte sequence. This way the tag can be used 
>> to
>> determine the data length of each item in the interval.
>>
>> Should the changes to AQL and ADM include this interval storage 
>> change
>> (moving the type tag to the first byte of the interval storage 
>> format)?
>>
>> On Tue, Jan 26, 2016 at 12:42 PM, Till Westmann <tillw@apache.org> 
>> wrote:
>>
>>> That’s actually a nice and generic serialization.
>>> I think that we should do this similarly in ADM and AQL.
>>> I.e. instead of using
>>>
>>> interval-from-date("2012-01-01”, ”2013-04-01”)
>>>
>>> (note the two parameters) in AQL and
>>>
>>> interval-date("2012-01-01, 2013-04-01")
>>>
>>> (not the single parameter) in ADM we should use
>>>
>>> interval(date("2012-01-01”), date(”2013-04-01”))
>>>
>>> for both. That would have a number of advantages:
>>>
>>> 1) It is consistent between AQL and ADM.
>>> 2) It is consistent with the JSON serialization.
>>> 3) It reduces the number of magic parsers.
>>> 4) It keeps the interval orthogonal to the type used in the 
>>> interval.
>>>
>>> On 4): While we don’t support intervals of other types than date, 
>>> time,
>>> and datetime so far, I think that we should change that and so this 
>>> would
>>> be a good step in that direction as well.
>>>
>>> The disadvantages are
>>>
>>> 1) Incompatible AQL change
>>> 2) Incompatible ADM change
>>>
>>> Thoughts?
>>>
>>> Cheers,
>>> Till
>>>
>>>
>>> On 26 Jan 2016, at 11:46, Eldon Carman wrote:
>>>
>>> I found that the lossless-JSON and clean-JSON printers were not 
>>> being
>>>> used.
>>>> After connecting them to the respective JSON printer, I ran the 
>>>> query
>>>> again.
>>>>
>>>> lossless-JSON result:
>>>> { "orderedlist": [ { "date-interval": { "interval": { "start": { 
>>>> "date":
>>>> "2012-01-01" }, "end": { "date": "2013-04-01" }}} }, { 
>>>> "time-interval": {
>>>> "interval": { "start": { "time": "12:23:34.456Z" }, "end": { 
>>>> "time":
>>>> "15:34:45.567Z" }}} }, { "datetime-interval": { "interval": { 
>>>> "start": {
>>>> "datetime": "2012-01-01T04:23:34.456Z" }, "end": { "datetime":
>>>> "2013-04-01T15:34:45.567Z" }}} } ] }
>>>>
>>>>
>>>> clean-JSON result:
>>>> [ { "date-interval": { "interval": { "start": "2012-01-01", "end":
>>>> "2013-04-01"}} }, { "time-interval": { "interval": { "start":
>>>> "12:23:34.456Z", "end": "15:34:45.567Z"}} }, { "datetime-interval": 
>>>> {
>>>> "interval": { "start": "2012-01-01T04:23:34.456Z", "end":
>>>> "2013-04-01T15:34:45.567Z"}} } ]
>>>>
>>>> Is this what you would have expected?
>>>>
>>>> On Mon, Jan 25, 2016 at 7:07 PM, Eldon Carman <ecarm002@ucr.edu> 
>>>> wrote:
>>>>
>>>>>
>>>>> Thanks Chris for adding a fourth option. This option would focus 
>>>>> our
>>>>>
>>>> updates to only the ADM output.
>>>>
>>>>>
>>>>> Yes, both lossless-JSON and clean-JSON outputs would need to be 
>>>>> check
>>>>>
>>>> also.
>>>>
>>>>>
>>>>> On Mon, Jan 25, 2016 at 5:58 PM, Chris Hillery 
>>>>> <chillery@hillery.land>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>> I would vote for:
>>>>>>
>>>>>> d. Update the serialized format to output "interval-from-date" 
>>>>>> and put
>>>>>>
>>>>> both
>>>>
>>>>> dates in quotes.
>>>>>>
>>>>>> I like the function name interval-from-date() better, and I don't

>>>>>> think
>>>>>> there's any need to maintain backwards compatibility with the old

>>>>>> name
>>>>>> which clearly never worked.
>>>>>>
>>>>>> Couple thoughts, though: The serialized format really should be 
>>>>>> "ADM",
>>>>>>
>>>>> not
>>>>
>>>>> "AQL". As such I don't think it should reference functions at all. 
>>>>> We
>>>>>> already do this for many datatypes, such as uuid("...") and
>>>>>> datetime("..."). Are those truly "Functions"? Are they 
>>>>>> "constructors",
>>>>>>
>>>>> and
>>>>
>>>>> is that different? In any case, the answer for interval types 
>>>>> should be
>>>>>> consistent with that.
>>>>>>
>>>>>> Final note: quite possibly the lossless-JSON and clean-JSON 
>>>>>> outputs for
>>>>>> intervals are broken as well, and should be fixed.
>>>>>>
>>>>>> Ceej
>>>>>> aka Chris Hillery
>>>>>>
>>>>>> On Mon, Jan 25, 2016 at 5:36 PM, Till Westmann <tillw@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> Voting for a. Seems to be the least redundant option.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>>
>>>>>>> On 25 Jan 2016, at 16:47, Eldon Carman wrote:
>>>>>>>
>>>>>>> The interval field value printed in the ADM results can not be

>>>>>>> used to
>>>>>>>
>>>>>>>> create an interval.
>>>>>>>>
>>>>>>>> Intervals have several functions that are used to construct
an
>>>>>>>>
>>>>>>> interval:
>>>>
>>>>> interval-from-date/time/datetime
>>>>>>>> and interval-start-from-date/time/datetime. It appears that

>>>>>>>> this is
>>>>>>>>
>>>>>>> the
>>>>
>>>>> only way to create an interval. Thus, a user must use one of these
>>>>>>>> function
>>>>>>>> to create an interval.
>>>>>>>>
>>>>>>>> The following query shows how to create three intervals.
>>>>>>>>
>>>>>>>> Query:
>>>>>>>> let $di := {"date-interval": interval-from-date("2012-01-01",
>>>>>>>> "2013-04-01")}
>>>>>>>> let $ti := {"time-interval": 
>>>>>>>> interval-from-time("12:23:34.456Z",
>>>>>>>> "233445567+0800")}
>>>>>>>> let $dti := {"datetime-interval":
>>>>>>>> interval-from-datetime("2012-01-01T12:23:34.456+08:00",
>>>>>>>> "20130401T153445567Z")}
>>>>>>>> return [$di, $ti, $dti];
>>>>>>>>
>>>>>>>> Result:
>>>>>>>> { "date-interval": interval-date("2012-01-01, 2013-04-01")
}, {
>>>>>>>> "time-interval": interval-time("12:23:34.456Z, 15:34:45.567Z")

>>>>>>>> }, {
>>>>>>>> "datetime-interval": 
>>>>>>>> interval-datetime("2012-01-01T04:23:34.456Z,
>>>>>>>> 2013-04-01T15:34:45.567Z") } ]
>>>>>>>>
>>>>>>>> Notice the results show interval-date("date, date") which
is
>>>>>>>> different
>>>>>>>> than
>>>>>>>> the functions that are used to create a date interval. Notice

>>>>>>>> that
>>>>>>>> interval-date does not exists in AsterixDB and that the input

>>>>>>>> is a
>>>>>>>>
>>>>>>> single
>>>>
>>>>> string of dates separated by a comma. Below are some ideas on how 
>>>>> to
>>>>>>>> create
>>>>>>>> a round-trip for intervals.
>>>>>>>>
>>>>>>>> Options for round tripping:
>>>>>>>> a: Rename "interval-from-date" to "interval-date" and update

>>>>>>>> the
>>>>>>>>
>>>>>>> output to
>>>>
>>>>> put both dates in quotes.
>>>>>>>> b: Add alias for "interval-from-date" to "interval-date"
and 
>>>>>>>> update
>>>>>>>>
>>>>>>> the
>>>>
>>>>> output to put both dates in quotes.
>>>>>>>> c: Create an interval date constructor (called interval-date)

>>>>>>>> that
>>>>>>>> can
>>>>>>>> parse the string "date, date".
>>>>>>>>
>>>>>>>> The same process should be used for intervals with time and

>>>>>>>> datetime.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>

Mime
View raw message