asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Round Tripping ADM Interval Data
Date Fri, 29 Jan 2016 08:15:16 GMT
+1 on all points below!  :-)

On 1/28/16 11:00 PM, Till Westmann wrote:
> Sounds good to me. It will change the binary representation of 
> intervals, so it’s not backwards compatible.
> But it seems that the tag-first representation is the way it should 
> have been anyway and it’s a better way to go forward to support more 
> generic intervals. (And I prefer layout changes rather now than later ..)
>
> Cheers,
> Till
>
> On 28 Jan 2016, at 21:46, Eldon Carman wrote:
>
>> Based on the discussion, I have suggestion for a two phase approach to
>> support generic intervals.
>>
>>
>> Phase 1: External
>> The first phase will focus on areas seen by a user, either though ADM or
>> AQL. The new format will also be open to supporting other interval 
>> types,
>> although currently will only support date, time and datetime. Here are a
>> few of the action items:
>>
>> - ADM printer and parser changed to support the new generic style format
>>  - interval(date("2012-01-01”), date(”2013-04-01”))
>> - Add an interval AQL constructor to support the above format
>> - Alter interval byte structure to support any interval type
>> - byte tag, T start, T end
>> - where T is currently only date, time and datetime
>>
>> I created a ticket to track status for Phase 1:
>> https://issues.apache.org/jira/browse/ASTERIXDB-1281
>>
>>
>> Phase 2: Internal
>> Phase two will focus on items under the hood. These items are only 
>> seen by
>> a developer and need to be update to support new interval types. The
>> complete task list will require some investigation. In addition to 
>> picking
>> what types of intervals should be supported.
>>
>> Phase 1 can start immediately while Phase 2 can wait until needed.
>>
>> Thoughts?
>>
>>
>>
>> On Tue, Jan 26, 2016 at 2:14 PM, Eldon Carman <ecarm002@ucr.edu> wrote:
>>
>>> While its a little more work to implement up front, the following 
>>> format
>>> would be generic and support alternate interval types in the future. It
>>> would be nice to have consistency between AQL and ADM, although not
>>> required.
>>>
>>> interval(date("2012-01-01”), date(”2013-04-01”))
>>>
>>> As we move to supporting generic intervals, the byte storage format 
>>> will
>>> need to be updated. Currently an interval is represented by: start 
>>> (long),
>>> end (long), type tag (byte). To support other types, the type tag 
>>> should be
>>> at the beginning of the byte sequence. This way the tag can be used to
>>> determine the data length of each item in the interval.
>>>
>>> Should the changes to AQL and ADM include this interval storage change
>>> (moving the type tag to the first byte of the interval storage format)?
>>>
>>> On Tue, Jan 26, 2016 at 12:42 PM, Till Westmann <tillw@apache.org> 
>>> wrote:
>>>
>>>> That’s actually a nice and generic serialization.
>>>> I think that we should do this similarly in ADM and AQL.
>>>> I.e. instead of using
>>>>
>>>> interval-from-date("2012-01-01”, ”2013-04-01”)
>>>>
>>>> (note the two parameters) in AQL and
>>>>
>>>> interval-date("2012-01-01, 2013-04-01")
>>>>
>>>> (not the single parameter) in ADM we should use
>>>>
>>>> interval(date("2012-01-01”), date(”2013-04-01”))
>>>>
>>>> for both. That would have a number of advantages:
>>>>
>>>> 1) It is consistent between AQL and ADM.
>>>> 2) It is consistent with the JSON serialization.
>>>> 3) It reduces the number of magic parsers.
>>>> 4) It keeps the interval orthogonal to the type used in the interval.
>>>>
>>>> On 4): While we don’t support intervals of other types than date, 
>>>> time,
>>>> and datetime so far, I think that we should change that and so this 
>>>> would
>>>> be a good step in that direction as well.
>>>>
>>>> The disadvantages are
>>>>
>>>> 1) Incompatible AQL change
>>>> 2) Incompatible ADM change
>>>>
>>>> Thoughts?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>>
>>>> On 26 Jan 2016, at 11:46, Eldon Carman wrote:
>>>>
>>>> I found that the lossless-JSON and clean-JSON printers were not being
>>>>> used.
>>>>> After connecting them to the respective JSON printer, I ran the query
>>>>> again.
>>>>>
>>>>> lossless-JSON result:
>>>>> { "orderedlist": [ { "date-interval": { "interval": { "start": { 
>>>>> "date":
>>>>> "2012-01-01" }, "end": { "date": "2013-04-01" }}} }, { 
>>>>> "time-interval": {
>>>>> "interval": { "start": { "time": "12:23:34.456Z" }, "end": { "time":
>>>>> "15:34:45.567Z" }}} }, { "datetime-interval": { "interval": { 
>>>>> "start": {
>>>>> "datetime": "2012-01-01T04:23:34.456Z" }, "end": { "datetime":
>>>>> "2013-04-01T15:34:45.567Z" }}} } ] }
>>>>>
>>>>>
>>>>> clean-JSON result:
>>>>> [ { "date-interval": { "interval": { "start": "2012-01-01", "end":
>>>>> "2013-04-01"}} }, { "time-interval": { "interval": { "start":
>>>>> "12:23:34.456Z", "end": "15:34:45.567Z"}} }, { "datetime-interval": {
>>>>> "interval": { "start": "2012-01-01T04:23:34.456Z", "end":
>>>>> "2013-04-01T15:34:45.567Z"}} } ]
>>>>>
>>>>> Is this what you would have expected?
>>>>>
>>>>> On Mon, Jan 25, 2016 at 7:07 PM, Eldon Carman <ecarm002@ucr.edu>

>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Thanks Chris for adding a fourth option. This option would focus
our
>>>>>>
>>>>> updates to only the ADM output.
>>>>>
>>>>>>
>>>>>> Yes, both lossless-JSON and clean-JSON outputs would need to be 
>>>>>> check
>>>>>>
>>>>> also.
>>>>>
>>>>>>
>>>>>> On Mon, Jan 25, 2016 at 5:58 PM, Chris Hillery 
>>>>>> <chillery@hillery.land>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>> I would vote for:
>>>>>>>
>>>>>>> d. Update the serialized format to output "interval-from-date"

>>>>>>> and put
>>>>>>>
>>>>>> both
>>>>>
>>>>>> dates in quotes.
>>>>>>>
>>>>>>> I like the function name interval-from-date() better, and I 
>>>>>>> don't think
>>>>>>> there's any need to maintain backwards compatibility with the

>>>>>>> old name
>>>>>>> which clearly never worked.
>>>>>>>
>>>>>>> Couple thoughts, though: The serialized format really should
be 
>>>>>>> "ADM",
>>>>>>>
>>>>>> not
>>>>>
>>>>>> "AQL". As such I don't think it should reference functions at 
>>>>>> all. We
>>>>>>> already do this for many datatypes, such as uuid("...") and
>>>>>>> datetime("..."). Are those truly "Functions"? Are they 
>>>>>>> "constructors",
>>>>>>>
>>>>>> and
>>>>>
>>>>>> is that different? In any case, the answer for interval types 
>>>>>> should be
>>>>>>> consistent with that.
>>>>>>>
>>>>>>> Final note: quite possibly the lossless-JSON and clean-JSON 
>>>>>>> outputs for
>>>>>>> intervals are broken as well, and should be fixed.
>>>>>>>
>>>>>>> Ceej
>>>>>>> aka Chris Hillery
>>>>>>>
>>>>>>> On Mon, Jan 25, 2016 at 5:36 PM, Till Westmann <tillw@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Voting for a. Seems to be the least redundant option.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>>
>>>>>>>> On 25 Jan 2016, at 16:47, Eldon Carman wrote:
>>>>>>>>
>>>>>>>> The interval field value printed in the ADM results can not
be 
>>>>>>>> used to
>>>>>>>>
>>>>>>>>> create an interval.
>>>>>>>>>
>>>>>>>>> Intervals have several functions that are used to construct
an
>>>>>>>>>
>>>>>>>> interval:
>>>>>
>>>>>> interval-from-date/time/datetime
>>>>>>>>> and interval-start-from-date/time/datetime. It appears
that 
>>>>>>>>> this is
>>>>>>>>>
>>>>>>>> the
>>>>>
>>>>>> only way to create an interval. Thus, a user must use one of these
>>>>>>>>> function
>>>>>>>>> to create an interval.
>>>>>>>>>
>>>>>>>>> The following query shows how to create three intervals.
>>>>>>>>>
>>>>>>>>> Query:
>>>>>>>>> let $di := {"date-interval": interval-from-date("2012-01-01",
>>>>>>>>> "2013-04-01")}
>>>>>>>>> let $ti := {"time-interval": interval-from-time("12:23:34.456Z",
>>>>>>>>> "233445567+0800")}
>>>>>>>>> let $dti := {"datetime-interval":
>>>>>>>>> interval-from-datetime("2012-01-01T12:23:34.456+08:00",
>>>>>>>>> "20130401T153445567Z")}
>>>>>>>>> return [$di, $ti, $dti];
>>>>>>>>>
>>>>>>>>> Result:
>>>>>>>>> { "date-interval": interval-date("2012-01-01, 2013-04-01")
}, {
>>>>>>>>> "time-interval": interval-time("12:23:34.456Z, 15:34:45.567Z")

>>>>>>>>> }, {
>>>>>>>>> "datetime-interval": interval-datetime("2012-01-01T04:23:34.456Z,
>>>>>>>>> 2013-04-01T15:34:45.567Z") } ]
>>>>>>>>>
>>>>>>>>> Notice the results show interval-date("date, date") which
is
>>>>>>>>> different
>>>>>>>>> than
>>>>>>>>> the functions that are used to create a date interval.
Notice 
>>>>>>>>> that
>>>>>>>>> interval-date does not exists in AsterixDB and that the
input 
>>>>>>>>> is a
>>>>>>>>>
>>>>>>>> single
>>>>>
>>>>>> string of dates separated by a comma. Below are some ideas on how
to
>>>>>>>>> create
>>>>>>>>> a round-trip for intervals.
>>>>>>>>>
>>>>>>>>> Options for round tripping:
>>>>>>>>> a: Rename "interval-from-date" to "interval-date" and
update the
>>>>>>>>>
>>>>>>>> output to
>>>>>
>>>>>> put both dates in quotes.
>>>>>>>>> b: Add alias for "interval-from-date" to "interval-date"
and 
>>>>>>>>> update
>>>>>>>>>
>>>>>>>> the
>>>>>
>>>>>> output to put both dates in quotes.
>>>>>>>>> c: Create an interval date constructor (called interval-date)

>>>>>>>>> that
>>>>>>>>> can
>>>>>>>>> parse the string "date, date".
>>>>>>>>>
>>>>>>>>> The same process should be used for intervals with time
and 
>>>>>>>>> datetime.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message