arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Closing Old PRs: Time Interval Type
Date Thu, 07 Mar 2019 03:45:26 GMT
Based on the sync today, I'm going to summarize the change that the PR [1]
proposes in case any one wants to discuss it further before it gets
reviewed and hopefully merged.

It keeps the existing Interval type and unit enums untouched.  The only
change is it declares DAY_TIME, which is an Array/Vector of Struct { int32
days, int32 hours}, as not required to be supported by Arrow
implementations.  YEAR_MONTH is still required.  Based on this we might
want to add some details around the Guidelines for  compatibility [2].
 The Interval type can also be extended in the future to support other
"Calendar" like objects if the need arises (e.g. weeks or types conforming
more closely to SQL interval data structures).

The PR also introduces a new type DurationInterval.  DurationIntervals that
measure the difference between two Unix Times (seconds, milliseconds,
microseconds or nanoseconds since the unix epoch not including
leap-seconds).  There is no way to convert 100% correctly between
DurationInterval and Interval types without a Timestamp as a point of
reference.  This is intentional as it provides a clean divide between
"Calendar" like objects and absolute time differences.

Heuristics such as assuming a day consists of 24 hours, can be used by
consumers of Arrow Arrays/Vectors and this shouldn't affect the format.

Please chime in if you have any concerns.

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/3644
[2]
https://github.com/apache/arrow/blob/master/docs/source/format/Guidelines.rst

On Sun, Feb 17, 2019 at 3:00 PM Micah Kornfield <emkornfield@gmail.com>
wrote:

> Thanks Wes,
> I've opened a PR [1] with one possible way forward on this issue.  The PR
> leaves the existing Java interval implementations untouched and introduces
> a new type to meet the other requirements previously discussed on the
> mailing list.  The naming of the new types could probably be improved but
> hopefully this at least serves as a good basis for discussion.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/arrow/pull/3644
>
> On Sunday, February 17, 2019, Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> hi Micah,
>>
>> I've been trying to make progress on the Interval stuff for more than
>> 18 months now -- this is IMHO a barrier to getting project to a 1.0.0
>> version so we should try to do it in 2019. I'm OK with implementing a
>> change that minimizes disruption to users of the current Java code
>> (e.g. Dremio). We need to get some feedback from the Java developers
>> about how to proceed.
>>
>> - Wes
>>
>> On Sat, Jan 26, 2019 at 2:43 AM Micah Kornfield <emkornfield@gmail.com>
>> wrote:
>> >
>> > I'm going through the process of triaging old PRs, and one of the oldest
>> > ones [1] deals deals with defining the time interval type.  The PR
>> seems to
>> > reflect previously agreed upon discussion [2][3] on how the type should
>> be
>> > modeled.
>> >
>> > It seems like the next steps are:
>> > 1.  Get the PR reviewed and merged (unless there is more discussion
>> > needed).  I think a committer that works on the Java implementation
>> would
>> > be ideal for this, given it has the largest potential impact on that
>> part
>> > of the project.
>> > 2.  Update the java interval types to support this change (or create new
>> > ones if the existing ones are necessary for legacy support?):
>> >     - Change Year-Month [4] to be  8 bytes (64 bits)
>> >     - Change Day-Time [5] to not be a packed type and add support for
>> > resolution
>> > 3.  Update the IntervalType [6] in C++ (and maybe other languages to
>> > include DAY_TIME resolution)
>> > 4.  Add integration tests to confirm compatibility.
>> >
>> > Do these steps sound right?  Is there anything missing?  Are there any
>> more
>> > concerns?
>> >
>> > Can a committer/PMC member take on item 1?
>> >
>> > If there are no objections I can create missing JIRAs for 2., 3. and 4.
>> >
>> > Thanks,
>> > Micah
>> >
>> > [1] https://github.com/apache/arrow/pull/920
>> > [2]
>> >
>> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone#ColumnarFormat1.0Milestone-Interval/Timedelta
>> > [3]
>> >
>> https://lists.apache.org/thread.html/849aca6235757b7ff7cb94c7b459a23539ba942ad900ebaa48519896@%3Cdev.arrow.apache.org%3E
>> > [4]
>> >
>> https://github.com/apache/arrow/blob/87feee3d941ee41fb39b25411e108bef40a55995/java/vector/src/main/java/org/apache/arrow/vector/IntervalYearVector.java
>> > [5]
>> >
>> https://github.com/apache/arrow/blob/87feee3d941ee41fb39b25411e108bef40a55995/java/vector/src/main/java/org/apache/arrow/vector/IntervalDayVector.java
>> > [6]
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/type.h#L733
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message