Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 132991004B for ; Mon, 8 Dec 2014 01:52:39 +0000 (UTC) Received: (qmail 68673 invoked by uid 500); 8 Dec 2014 01:52:39 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 68635 invoked by uid 500); 8 Dec 2014 01:52:39 -0000 Mailing-List: contact dev-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.apache.org Delivered-To: mailing list dev@tajo.apache.org Received: (qmail 68625 invoked by uid 99); 8 Dec 2014 01:52:38 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Dec 2014 01:52:38 +0000 Received: from mail-qa0-f43.google.com (mail-qa0-f43.google.com [209.85.216.43]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id D633E1A0115 for ; Mon, 8 Dec 2014 01:52:36 +0000 (UTC) Received: by mail-qa0-f43.google.com with SMTP id bm13so2770966qab.2 for ; Sun, 07 Dec 2014 17:52:33 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.224.121.1 with SMTP id f1mr48745621qar.76.1418003553059; Sun, 07 Dec 2014 17:52:33 -0800 (PST) Received: by 10.96.196.166 with HTTP; Sun, 7 Dec 2014 17:52:32 -0800 (PST) In-Reply-To: References: Date: Mon, 8 Dec 2014 10:52:32 +0900 Message-ID: Subject: Re: Timezone issues From: Hyunsik Choi To: "dev@tajo.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Jaewoong and CharSyam, I submitted the patch that fixes timezone-related problems discussed here. I rearranged all time zone usages throughout Tajo. https://issues.apache.org/jira/browse/TAJO-1234 Thanks to the discussion of this mailing list with you guys, I was clear enough to find or fix the problems. Warm regards, Hyunsik On Tue, Nov 25, 2014 at 5:54 PM, Hyunsik Choi wrote: > I'm happy to work with you and fix a major nuisance :) Later, I'll > share the timezone related problem with you when I found additional > bugs. > > Warm regards, > Hyunsik > > On Tue, Nov 25, 2014 at 4:17 PM, Jaewoong Jung wrote: >> Everything is very clear now thanks to your explanation. :) >> >> Okay, then I'll fix the issue by making DateDatum timezone-neutral and >> TimeDatum UTC-based. Also, I'll play with PostgreSQL to understand its >> timezone model better. >> >> Meanwhile, please feel free to assign timezone-related bugs to me as >> you see fit. >> >> On Mon, Nov 24, 2014 at 5:52 PM, Hyunsik Choi wrote= : >>> Thank you all guys for your comments. >>> >>> Jaewoong, >>> >>> I leave inline comments. If my answers are not enough for your >>> question or I misunderstood your question, please feel free to ask >>> additional questions. >>> >>> Best regards, >>> Hyunsik >>> >>> On Fri, Nov 21, 2014 at 12:50 AM, Jaewoong Jung wrot= e: >>>> There's another issue that hopefully Hyunsik would be able to clarify, >>>> and it's very crucial to handling timezones in these data types. >>>> >>>> Q: So, let's say (and I agree) TimeDatum represents an instant, so can >>>> be timezoned. Then, is it a UTC time or a local time? >>>> >>> >>> TimeDatum is UTC or should be UTC if some parts are not. >>> >>>> Let me explain why this question is important. >>>> >>>> DateDatum represents an instant. And, it is implicitly timezoned to >>>> the user local time. Which means, if I use '11/20/2014', it's >>>> '11/20/2014 0:00:00 PST' and is equivalent to a TimestampDatum for >>>> '11/20/2014 8:00:00 UTC'. (BTW, the compareTo implementation for >>>> DateDatum is broken in this regard. I'll file a separate issue for >>>> that.) >>> >>> I agree with your proposal. I also think that DateDatum does not need >>> to be timezoned. We should keep it as DateDatum instead of >>> TimestampDatum. Thank you for nice finding! >>> >>>> What about TimeDatum? It is currently timezoned to UTC, (surprise, >>>> anyone?) if I understood the code correctly. When we add a TimeDatum >>>> to a DateDatum, we convert the TimeDatum to the user local time, which >>>> implies TimeDatum is UTC-based. (Also, the comment next to the line >>>> explicitly mentions it.) >>> >>> You are right. TimeDatum represents UTC time. >>> >>> FYI, I'd like to describe additional background. There are only two >>> entry points to take time or timestamp values. One is records in input >>> tables, and another is SQL statements. Currently, Input table uses the >>> system global timezone specified in TajoConf (tajo-site.xml file). >>> Later, we will add one table property to allow users to specify >>> timezone for each table. For SQL statement (e.g., SELECT time >>> '03:00:00'), we will use client timezone. Also, we will provide some >>> expression to allow users to specify timezone for time or timestamp in >>> SQL statements. >>> >>> Consequently, only two entry points have to deal with timezone for >>> Timestamp and Time. Other parts in Tajo should deal with all values in >>> UTC. >>> >>>> >>>> So, here's the problem. >>>> >>>> Why do they have different timezones? They're incomplete as an instant >>>> when used alone and are complementary to each other. This is an >>>> important concept. To understand it, you have to think about why >>>> adding DateDatum and TimeDatum is allowed in the code. Originally, >>>> instants can't be added. (And, that's why I thought TimeDatum is not >>>> an instant.) You can't add (say) 11/20/2014 to 11/23/2019. Subtracting >>>> an instant from an instant makes sense and results in a period, but >>>> they can't be added. >>>> >>>> However, in the case of DateDatum and TimeDatum, additions are allowed >>>> because they're complementary to each other, and what the code does >>>> **conceptually** is concatenate the two. >>> >>> Great insight! So far, I haven't thought it. >>> >>>> Therefore, because they're intended to be used together, I'd argue >>>> they shouldn't have different timezones. Also, if they have different >>>> timezones, additions can't have a simple correct answer. What's the >>>> correct answer of 11/20/2014 (PST) + 8:00:00 (UTC)? There's no clear >>>> answer because they can't be simply concatenated. >>>> >>>> (FWIW, the current Tajo code thinks the answer is 11/20/2014 8:00:00 >>>> in UTC. How many of you got it right?) >>> >>> You are definitely right :) If they have different timezone, the >>> problem becomes very complicated. Nobody wants it :) As I mentioned, >>> Timezone problem of Timestamp and Time data types should be addressed >>> in two entry points and client. We need to keep the processing >>> approach simple. >>> >>>> This can cause a lot of confusion to users. When they use a date >>>> alone, it is interpreted as a local time. But, as soon as they add a >>>> time to it, it is silently converted to UTC in a way which is very >>>> unexpected to many users. >>>> >>>> Why am I emphasizing it is unexpected? Look at the comparison below. >>>> >>>> What's the answer to 11/20/2014 (DateDatum) + 8 hours (IntervalDatum)? >>>> It's 11/20/2014 8:00 in a local time (PST on my machine). How about >>>> 11/20/2014 (DateDatum) + 8:00:00 (TimeDatum)? It's 11/20/2014 8:00 in >>>> UTC as I wrote above. How many users would be able to expect this? >>> >>> They are definitely a bug. We follow PostgreSQL in all aspects, and >>> the following results come from the PostgreSQL. The results of two >>> operations are the same. >>> >>> hyunsik=3D> SELECT date '11/20/2014' + time '08:00'; >>> ?column? >>> --------------------- >>> 2014-11-20 08:00:00 >>> (1 row) >>> >>> hyunsik=3D> select date '11/20/2014' + interval '8 hrs'; >>> ?column? >>> --------------------- >>> 2014-11-20 08:00:00 >>> (1 row) >>> >>> >>>> >>>> So, coming back to my original question. What timezone should a >>>> TimeDatum have? UTC or local time? It's currently UTC. But, I believe >>>> it should be changed to the local time zone. >>> >>> TimeDatum is UTC. >>> >>> In sum, we should keep both TimeDatum and TimestampDatum UTC values. >>> Then, we should address timezone offsets in two entry points and >>> client side. >>> >>>> >>>> (Sorry for the long email. But, I think it's critical to get this >>>> right and build consensus over it so that we can provide a consistent >>>> behavior going forward. The actual fix will be really simple, though.) >>>> >>>> On Thu, Nov 20, 2014 at 2:44 AM, Hyunsik Choi = wrote: >>>>> Those parts have poor documentation. >>>>> >>>>> I agree with your investigation. I also could find many misuse of tim= ezone >>>>> in many parts. We should make them clear and fix them in this chance. >>>>> >>>>> I just got off the plane, and I'm still on the road. So, I'll give mo= re >>>>> comments tomorrow. >>>>> >>>>> - hyunsik >>>>> >>>>> On Thu, Nov 20, 2014 at 5:24 PM, Jaewoong Jung wro= te: >>>>> >>>>>> Wow, this seemingly trivial issue has surprisingly many problems inv= olved. >>>>>> >>>>>> The most critical one, though, is TimeMeta class. Presumably because >>>>>> it is poorly documented, it is being used for two different purposes >>>>>> in Tajo code base. Some code treats it as a date time representation= , >>>>>> which I believe is the original intention, but some treat it as the >>>>>> human-readable equivalent of TimeDatum by completely ignoring >>>>>> date-related fields. >>>>>> >>>>>> For example, DateTimeUtil.date2j(long julianDate, TimeMeta tm), whic= h >>>>>> converts a julian timestamp to a TimeMeta value, doesn't touch >>>>>> dayOfMonth, monthOfYear, or years values and just puts all values fo= r >>>>>> hour and above units in hours field. >>>>>> >>>>>> There are other minor problems like incorrect comments and absent >>>>>> default values, but the most critical one is misuse of TimeMeta. >>>>>> >>>>>> I'll try to break up my patches so that each has a clear and >>>>>> easy-to-understand goal. Sending this heads-up to let you know >>>>>> there'll be more issues filed and patches sent than you might expect= . >>>>>> >>>>>> On Wed, Nov 19, 2014 at 10:56 PM, Jaewoong Jung w= rote: >>>>>> > Yeah, after some more research, I found that TImeDatum is a somewh= at >>>>>> > ambiguous data type. >>>>>> > >>>>>> > Its original purpose is to represent a time of a day, i.e. hh:mm:s= s >>>>>> > part of an instant. So, it could be viewed as instant data though = some >>>>>> > may argue it's incomplete to fully represent an instant. Anyway, g= iven >>>>>> > that TimestampDatum has limitation in terms of the time range it c= an >>>>>> > represent, and given that Tajo doesn't have DateTime data type, >>>>>> > clients should be allowed to use it with a timezone. >>>>>> > >>>>>> > I'll change the direction and try to address the issue by fixing t= he >>>>>> > underflow error. >>>>>> > >>>>>> > Thanks for the input. :) >>>>>> > >>>>>> > On Wed, Nov 19, 2014 at 10:24 PM, Jong-young Park >>>>>> wrote: >>>>>> >> Hi, Jaewoong. >>>>>> >> >>>>>> >> To express time period value, IntervalDatum is existing as I know= . >>>>>> >> >>>>>> >> So I think it is right that TimeDatum is for some time point. >>>>>> >> >>>>>> >> And TimestampDatum seems it is doing both roles of DateDatum and >>>>>> TimeDatum. >>>>>> >> >>>>>> >> Regards, >>>>>> >> Jongyoung. >>>>>> >> >>>>>> >> On Wed Nov 19 2014 at =EC=98=A4=ED=9B=84 5:23:40 Jaewoong Jung >>>>>> wrote: >>>>>> >> >>>>>> >>> It turns out, TAJO-1191 is slightly more complicated than I thou= ght. >>>>>> >>> (https://issues.apache.org/jira/browse/TAJO-1191) >>>>>> >>> >>>>>> >>> Basically, it's about whether TimeDatum may have a timezone tied= with >>>>>> >>> it. I **believe** TimeDatum is originally designed to hold a tim= e >>>>>> >>> period value, not an instant. (TimestampDatum seems to be the >>>>>> >>> canonical container for instants.) So, it doesn't make sense to = apply >>>>>> >>> any timezones to TimeDatum values, but it's being done in a few >>>>>> >>> places. And, that's why the test is failing on my machines. >>>>>> >>> >>>>>> >>> I'm going to try to fix it by removing all timezone-related refe= rences >>>>>> >>> around the class, but I want to check my assumption with you bef= ore I >>>>>> >>> proceed. >>>>>> >>> >>>>>> >>> What do you think about it? >>>>>> >>> >>>>>>