tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaewoong Jung <jun...@gmail.com>
Subject Re: Timezone issues
Date Thu, 20 Nov 2014 15:50:41 GMT
There's another issue that hopefully Hyunsik would be able to clarify,
and it's very crucial to handling timezones in these data types.

Q: So, let's say (and I agree) TimeDatum represents an instant, so can
be timezoned. Then, is it a UTC time or a local time?

Let me explain why this question is important.

DateDatum represents an instant. And, it is implicitly timezoned to
the user local time. Which means, if I use '11/20/2014', it's
'11/20/2014 0:00:00 PST' and is equivalent to a TimestampDatum for
'11/20/2014 8:00:00 UTC'. (BTW, the compareTo implementation for
DateDatum is broken in this regard. I'll file a separate issue for
that.)

What about TimeDatum? It is currently timezoned to UTC, (surprise,
anyone?) if I understood the code correctly. When we add a TimeDatum
to a DateDatum, we convert the TimeDatum to the user local time, which
implies TimeDatum is UTC-based. (Also, the comment next to the line
explicitly mentions it.)

So, here's the problem.

Why do they have different timezones? They're incomplete as an instant
when used alone and are complementary to each other. This is an
important concept. To understand it, you have to think about why
adding DateDatum and TimeDatum is allowed in the code. Originally,
instants can't be added. (And, that's why I thought TimeDatum is not
an instant.) You can't add (say) 11/20/2014 to 11/23/2019. Subtracting
an instant from an instant makes sense and results in a period, but
they can't be added.

However, in the case of DateDatum and TimeDatum, additions are allowed
because they're complementary to each other, and what the code does
**conceptually** is concatenate the two.

Therefore, because they're intended to be used together, I'd argue
they shouldn't have different timezones. Also, if they have different
timezones, additions can't have a simple correct answer. What's the
correct answer of 11/20/2014 (PST) + 8:00:00 (UTC)? There's no clear
answer because they can't be simply concatenated.

(FWIW, the current Tajo code thinks the answer is 11/20/2014 8:00:00
in UTC. How many of you got it right?)

This can cause a lot of confusion to users. When they use a date
alone, it is interpreted as a local time. But, as soon as they add a
time to it, it is silently converted to UTC in a way which is very
unexpected to many users.

Why am I emphasizing it is unexpected? Look at the comparison below.

What's the answer to 11/20/2014 (DateDatum) + 8 hours (IntervalDatum)?
It's 11/20/2014 8:00 in a local time (PST on my machine). How about
11/20/2014 (DateDatum) + 8:00:00 (TimeDatum)? It's 11/20/2014 8:00 in
UTC as I wrote above. How many users would be able to expect this?

So, coming back to my original question. What timezone should a
TimeDatum have? UTC or local time? It's currently UTC. But, I believe
it should be changed to the local time zone.

(Sorry for the long email. But, I think it's critical to get this
right and build consensus over it so that we can provide a consistent
behavior going forward. The actual fix will be really simple, though.)

On Thu, Nov 20, 2014 at 2:44 AM, Hyunsik Choi <hyunsik.choi@gmail.com> wrote:
> Those parts have poor documentation.
>
> I agree with your investigation. I also could find many misuse of timezone
> in many parts. We should make them clear and fix them in this chance.
>
> I just got off the plane, and I'm still on the road. So, I'll give more
> comments tomorrow.
>
> - hyunsik
>
> On Thu, Nov 20, 2014 at 5:24 PM, Jaewoong Jung <jungjw@gmail.com> wrote:
>
>> Wow, this seemingly trivial issue has surprisingly many problems involved.
>>
>> The most critical one, though, is TimeMeta class. Presumably because
>> it is poorly documented, it is being used for two different purposes
>> in Tajo code base. Some code treats it as a date time representation,
>> which I believe is the original intention, but some treat it as the
>> human-readable equivalent of TimeDatum by completely ignoring
>> date-related fields.
>>
>> For example, DateTimeUtil.date2j(long julianDate, TimeMeta tm), which
>> converts a julian timestamp to a TimeMeta value, doesn't touch
>> dayOfMonth, monthOfYear, or years values and just puts all values for
>> hour and above units in hours field.
>>
>> There are other minor problems like incorrect comments and absent
>> default values, but the most critical one is misuse of TimeMeta.
>>
>> I'll try to break up my patches so that each has a clear and
>> easy-to-understand goal. Sending this heads-up to let you know
>> there'll be more issues filed and patches sent than you might expect.
>>
>> On Wed, Nov 19, 2014 at 10:56 PM, Jaewoong Jung <jungjw@gmail.com> wrote:
>> > Yeah, after some more research, I found that TImeDatum is a somewhat
>> > ambiguous data type.
>> >
>> > Its original purpose is to represent a time of a day, i.e. hh:mm:ss
>> > part of an instant. So, it could be viewed as instant data though some
>> > may argue it's incomplete to fully represent an instant. Anyway, given
>> > that TimestampDatum has limitation in terms of the time range it can
>> > represent, and given that Tajo doesn't have DateTime data type,
>> > clients should be allowed to use it with a timezone.
>> >
>> > I'll change the direction and try to address the issue by fixing the
>> > underflow error.
>> >
>> > Thanks for the input. :)
>> >
>> > On Wed, Nov 19, 2014 at 10:24 PM, Jong-young Park <eminency@gmail.com>
>> wrote:
>> >> Hi, Jaewoong.
>> >>
>> >> To express time period value, IntervalDatum is existing as I know.
>> >>
>> >> So I think it is right that TimeDatum is for some time point.
>> >>
>> >> And TimestampDatum seems it is doing both roles of DateDatum and
>> TimeDatum.
>> >>
>> >> Regards,
>> >> Jongyoung.
>> >>
>> >> On Wed Nov 19 2014 at 오후 5:23:40 Jaewoong Jung <jungjw@gmail.com>
>> wrote:
>> >>
>> >>> It turns out, TAJO-1191 is slightly more complicated than I thought.
>> >>> (https://issues.apache.org/jira/browse/TAJO-1191)
>> >>>
>> >>> Basically, it's about whether TimeDatum may have a timezone tied with
>> >>> it. I **believe** TimeDatum is originally designed to hold a time
>> >>> period value, not an instant. (TimestampDatum seems to be the
>> >>> canonical container for instants.) So, it doesn't make sense to apply
>> >>> any timezones to TimeDatum values, but it's being done in a few
>> >>> places. And, that's why the test is failing on my machines.
>> >>>
>> >>> I'm going to try to fix it by removing all timezone-related references
>> >>> around the class, but I want to check my assumption with you before
I
>> >>> proceed.
>> >>>
>> >>> What do you think about it?
>> >>>
>>

Mime
View raw message