arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Date/Time fields values in Java
Date Thu, 16 Mar 2017 19:00:43 GMT
It looks like between

- ARROW-316 https://issues.apache.org/jira/browse/ARROW-316
- ARROW-617 https://issues.apache.org/jira/browse/ARROW-617
- ARROW-637 https://issues.apache.org/jira/browse/ARROW-637

We should be in pretty good shape with the date and time type
specification. Once we reach consensus and merge patches amending the
format we should work as quickly as possible to complete the Java and
C++ implementations so we can get integration tests running.

On Mon, Mar 13, 2017 at 9:53 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
> @JulienLD or @Jacques could you add your perspective on the JIRA from
> the Java side? Would be good to reach a consensus on this so we can
> make appropriate changes. We will want to push towards integration
> tests for the various date and time types in the 0.3 release
>
> If JulienH could also offer some SQL perspective that would be useful
>
> On Fri, Mar 10, 2017 at 8:00 PM, Julien Le Dem <julien@dremio.com> wrote:
>> The first option makes it easier for compatibility with existing code
>> I opened a JIRA to discuss:
>> https://issues.apache.org/jira/browse/ARROW-617
>>
>> On Fri, Mar 10, 2017 at 4:42 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
>>
>>> There's two routes, I guess:
>>>
>>> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
>>> - Use 64 bits for everything
>>>
>>> The latter is simpler to implement, the former saves space. I am not
>>> sure which is the better solution. Another situation where this will
>>> occur is with decimals, where the storage type may be a function of
>>> the precision and scale. Thoughts?
>>>
>>> On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <julien@dremio.com> wrote:
>>> > It sounds like we need to specify a different bit width depending on the
>>> > unit?
>>> > millisecond time fits in 32 bits but neither do micros nor nanos.
>>> > the java TimeVector uses 32 bit for now (and supports millis only):
>>> > https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
>>> >
>>> >
>>> > On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <wesmckinn@gmail.com>
>>> wrote:
>>> >
>>> >> Sorry to be a little slow to respond on this.
>>> >>
>>> >> Since we support nanosecond time unit, we need to use 64 bits. So it
>>> >> sounds like the bug is on the Java side
>>> >>
>>> >> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <cutlerb@gmail.com>
>>> wrote:
>>> >> > Thanks for the info Julien.  I'll open a JIRA for fixing the type
>>> layout
>>> >> > for TIME, and I'll give the documentation a shot.
>>> >> >
>>> >> > Regards,
>>> >> > Bryan
>>> >> >
>>> >> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <julien@dremio.com>
>>> wrote:
>>> >> >
>>> >> >> Hi Bryan,
>>> >> >> In the JSON representation we should use the integer representation
>>> of
>>> >> the
>>> >> >> Timestamp. We should not depend on joda for this.
>>> >> >>
>>> >> >> DATE is on 8 bytes => 64bits:
>>> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> >> >> 9ed3385a8d/format/Message.fbs#L79
>>> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
>>> ValueVectorTypes.tdd#L73
>>> >> >>
>>> >> >> Time in on 4 bytes => 32 bits and has a unit:
>>> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> >> >> 9ed3385a8d/format/Message.fbs#L85
>>> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
>>> ValueVectorTypes.tdd#L60
>>> >> >> It should the time in {unit} since midnight stored in a 32
bit
>>> integer.
>>> >> >> It should not have a default unit IMO
>>> >> >>
>>> >> >> So as you pointed out it looks like a bug both on the C++ and
java
>>> side
>>> >> for
>>> >> >> Time
>>> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
>>> >> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/
>>> >> >> vector/schema/TypeLayout.java#L163
>>> >> >> tests TODO here:
>>> >> >> https://issues.apache.org/jira/browse/ARROW-510
>>> >> >>
>>> >> >> We need to add the Date, Time, Timestamp description in the
doc:
>>> >> >> https://github.com/apache/arrow/blob/master/format/Metadata.md
>>> >> >> You are welcome to take a stab at it and send a Pull request
if you
>>> feel
>>> >> >> like it.
>>> >> >> Otherwise I'll update it.
>>> >> >>
>>> >> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <cutlerb@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> > I guess it would make sense to just store the time of
day value in
>>> >> >> > milliseconds to go along with the DATE type that contains
days
>>> since
>>> >> >> epoch,
>>> >> >> > which would fit into a 4 byte value.  Only I see conflicting
code
>>> in
>>> >> >> > TypeLayout.java that defines the schema as 64 bit width
>>> >> >> >
>>> >> >> > public TypeLayout visit(Time type) {
>>> >> >> >     return newFixedWidthTypeLayout(dataVector(64));
>>> >> >> > }
>>> >> >> >
>>> >> >> > And in C++ there is this comment
>>> >> >> >   // Exact time encoded with int64, default unit millisecond
>>> >> >> >   TIME,
>>> >> >> >
>>> >> >> > Does the TIME type still need to go through some discussion
to get
>>> >> pinned
>>> >> >> > down?
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Bryan
>>> >> >> >
>>> >> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <cutlerb@gmail.com>
>>> >> wrote:
>>> >> >> >
>>> >> >> > > Hello All,
>>> >> >> > >
>>> >> >> > > I've started work on ARROW-582 to add Date/Time support
for Java
>>> >> JSON
>>> >> >> > > files and would just like to clear up a few things.
 I believe
>>> the
>>> >> Java
>>> >> >> > > Time type is supposed to represent milliseconds since
epoch, it
>>> is
>>> >> >> stored
>>> >> >> > > as a FixedValueVector with a width of 4 bytes (equivalent
to Java
>>> >> >> 'int')
>>> >> >> > > and it retrieved by constructing a org.joda.time.DateTime
with
>>> that
>>> >> >> > value.
>>> >> >> > > Shouldn't this be an 8 byte width, equivalent to
Java 'long'?
>>> >> >> > >
>>> >> >> > >     <#elseif minor.class == "Time">
>>> >> >> > >     @Override
>>> >> >> > >     public DateTime getObject(int index) {
>>> >> >> > >
>>> >> >> > >         org.joda.time.DateTime time = new
>>> >> org.joda.time.DateTime(get(
>>> >> >> > index),
>>> >> >> > > org.joda.time.DateTimeZone.UTC);
>>> >> >> > >         time = time.withZoneRetainFields(org.
>>> >> joda.time.DateTimeZone.
>>> >> >> > > getDefault());
>>> >> >> > >         return time;
>>> >> >> > >     }
>>> >> >> > >
>>> >> >> > > Thanks,
>>> >> >> > > Bryan
>>> >> >> > >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Julien
>>> >> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Julien
>>>
>>
>>
>>
>> --
>> Julien

Mime
View raw message