From user-return-534-archive-asf-public=cust-asf.ponee.io@impala.apache.org  Fri Jan 12 06:15:32 2018
Return-Path: <user-return-534-archive-asf-public=cust-asf.ponee.io@impala.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id B42D2180656
	for <archive-asf-public@eu.ponee.io>; Fri, 12 Jan 2018 06:15:32 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id A3F9F160C41; Fri, 12 Jan 2018 05:15:32 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 7237A160C13
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 12 Jan 2018 06:15:31 +0100 (CET)
Received: (qmail 71108 invoked by uid 500); 12 Jan 2018 05:15:30 -0000
Mailing-List: contact user-help@impala.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@impala.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@impala.apache.org>
List-Post: <mailto:user@impala.apache.org>
List-Id: <user.impala.apache.org>
Reply-To: user@impala.apache.org
Delivered-To: mailing list user@impala.apache.org
Received: (qmail 71095 invoked by uid 99); 12 Jan 2018 05:15:30 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jan 2018 05:15:30 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1CBDA1808DB
	for <user@impala.apache.org>; Fri, 12 Jan 2018 05:15:30 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.898
X-Spam-Level: *
X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001,
	SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id qSYZBP_gwFhk for <user@impala.apache.org>;
	Fri, 12 Jan 2018 05:15:25 +0000 (UTC)
Received: from mail-wr0-f193.google.com (mail-wr0-f193.google.com [209.85.128.193])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DC97F5F39D
	for <user@impala.apache.org>; Fri, 12 Jan 2018 05:15:24 +0000 (UTC)
Received: by mail-wr0-f193.google.com with SMTP id 100so4246573wrb.7
        for <user@impala.apache.org>; Thu, 11 Jan 2018 21:15:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=VHqrn6ZDgkouIsoglxWUaPHzxLQ2pd3YvIcM4/yXWZg=;
        b=NAL2Is2D9MMyI89rSrdWCxyqCWRbEL7iH9Nd92o1YF/OOMGnoYuzzvKiMTp27fzxe1
         m0yS9EI0Af3vGASHIp7WofYyw25pgf51HKF/zJyzVMkxGgEYgt4R9gKEpC3LFlXgyYqA
         e4BeulYyvtUkmrpXacr3vJJYEN+NYC/+fY86rZQzvdBAHCQTMpSWVhjdfv52Buv61foE
         j/XX4clO1Ig96m9KNiCDMiDcIlGqHz86zrOyHjRxLt92WJzIUIr5BGWnugMr5xX7nmIF
         I3Le1t8XYPZp3yF8Uo/p439jcZCQDGMEqDDZ7IqefDaqjYUljw7RnQM3IgIPdXqVHXLd
         Yysg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=VHqrn6ZDgkouIsoglxWUaPHzxLQ2pd3YvIcM4/yXWZg=;
        b=rZR7/WvY/Ld+pzhkmCRfk9cOOpH22wLzqGruJwBmLkYnDyRuAY4VsCn3AEP8fKcsfh
         Vqb2Cqde0wpaThkW3ejbQwhQ8OjNfOfV1EjSLsYKSqw+Nml9iUYC1twWFq/ERoSmHK8S
         dRd9S/89w/VxpiYPwkcVfsBqIPTCfkk6PTRySZvJ/hPgnfbxaeVmErbStBtmR1/+PUux
         v6J627MLku82ETiCVbIzaFqdFJPUpudC7L28KBeQ+dSuMKs8Sc90goaDEOHQ6RuLmm9J
         BJxQ6jFK0oCdCOcpsu4mv7eK8tBVvInrkyXXnqwjpIMAUsSokpwZOyboAHjVi2h8L+XS
         S8Rg==
X-Gm-Message-State: AKwxytet8FDWrtGdcQsAOGRCDJYId/Kw3nzEU1q+WD1FAECDPAS3+9VK
	LRQJdwlFvpTd9YjQ10FqP9DimgLbiTR65TrQlKXMkA==
X-Google-Smtp-Source: ACJfBov7puASCZaQatYBfXwRn4ZwizQ84ohX7122UekU4fPCotQVYM5LeTux1TwsDWeQ45M6Jrf/NdYkh/tmeCgaJNs=
X-Received: by 10.223.139.218 with SMTP id w26mr2486846wra.67.1515734123526;
 Thu, 11 Jan 2018 21:15:23 -0800 (PST)
MIME-Version: 1.0
Received: by 10.223.170.66 with HTTP; Thu, 11 Jan 2018 21:15:22 -0800 (PST)
In-Reply-To: <2028035413.30456503.1513693418088.JavaMail.zimbra@comcast.net>
References: <1207329184.28032206.1513466498823.JavaMail.zimbra@comcast.net>
 <825758698.28044564.1513467510021.JavaMail.zimbra@comcast.net>
 <CADf=JzwP3JRF9PYvt7FhXRh7HmD1V+OVJuUUWxL7qaZtqAn1zA@mail.gmail.com> <2028035413.30456503.1513693418088.JavaMail.zimbra@comcast.net>
From: Greg Rahn <greg.rahn@gmail.com>
Date: Thu, 11 Jan 2018 21:15:22 -0800
Message-ID: <CANJvyC3OkAEnnhrB7r7GHw8-rSXk1XPwskCtkeDG_KNw2k2fSw@mail.gmail.com>
Subject: Re: Impala, Kudu, and timestamps (and UNIXTIME_MICROS...)
To: user@impala.apache.org
Content-Type: multipart/alternative; boundary="f403045e9cf001c00e05628d5a4d"

--f403045e9cf001c00e05628d5a4d
Content-Type: text/plain; charset="UTF-8"

Franco-

Are there still outstanding questions or issues with this?  I'm just
returning from holiday so wanted to check in.

Agreed that timestamp w/ and w/o time zone are different beasts and come
with their own challenges.

Cheers,


On Tue, Dec 19, 2017 at 6:23 AM, Franco Venturi <fventuri@comcast.net>
wrote:

> Thanks for your input Belazs.
> I am still pondering about this issue and I found very useful to follow
> your advice on seeing what's already in JIRA and doing some research on
> what others are doing.
>
> I didn't find anything that exactly describes my problem on the Impala
> JIRA (I might have missed it), however I did find a closely related
> discussion that is occurring in the Hive JIRA: I am referring to JIRAs
> HIVE-14305 (https://issues.apache.org/jira/browse/HIVE-14305), HIVE-14412
> (https://issues.apache.org/jira/browse/HIVE-14412), and HIVE-16418 (
> https://issues.apache.org/jira/browse/HIVE-16418).
>
> I found also useful to read how two of the most used RDBMS are dealing
> with timestamps:
>    - Oracle (see: https://docs.oracle.com/en/database/oracle/oracle-
> database/12.2/nlspg/datetime-data-types-and-time-zone-support.html)
>    - PostgreSQL (see: https://www.postgresql.org/docs/10/static/datatype-
> datetime.html)
>
> Finally I am going to refer to the Java 8 API as described here:
> https://docs.oracle.com/javase/8/docs/api/overview-summary.html; the
> reason is that a lot of Impala users (me included) connect to it via
> Java/JDBC and it is very important in my opinion that the Impala data types
> are correctly mapped to the correct Java classes to avoid confusion.
>
>
> After this long preamble here are some notes that hopefully will help in
> this discussion.
>
> - At the center of this issue is that there are two significantly
> different data types that we commonly refer to when we use the word
> 'timestamp':
>       - timestamp without time zone (most commonly referred to as just
> 'timestamp')
>       - timestamp with time zone
>
> - These two types are like apples and oranges in my opinion; getting
> confused between the two (like I did) can cause a lot of frustration (my
> first post in this thread shows that pretty well..)
>
> - The first data type ('timestamp without time zone' or just 'timestamp')
> is best described in this sentence in section 4.2.1.6 (Choosing a TIMESTAMP
> Data Type) in the Oracle document:
>
>         "Use the TIMESTAMP data type when you need a datetime value to
> record the time of an event without the time zone. For example, you can
> store information about the times when workers punch a time card in and out
> of their assembly line workstations. Because this is always a local time it
> is then not needed to store the timezone part"
>
> - I think this is the kind of timestamp that is currently implemented in
> Impala (please correct me if I am wrong) and in my opinion it should be
> mapped to something like the Java type java.time.LocalDateTime; the Java 8
> API for java.time.LocalDateTime has this sentence that I think provides a
> good insight on the meaning of this data type:
>
>         "This class does not store or represent a time-zone. Instead, it
> is a description of the date, as used for birthdays, combined with the
> local time as seen on a wall clock. It cannot represent an instant on the
> time-line without additional information such as an offset or time-zone."
>
> - Also this timestamp (again in my opinion) should not be mapped to the
> Kudu type UNIXTIME_MICROS, because their meaning is different (and this is
> what triggered my initial confusion, and I suspect I am not the only one
> out there, who misunderstood this difference)
>
>
> - For the second data type ('timestamp with time zone') I couldn't find a
> good definition in the Oracle document, however this sentence from the Java
> API for java.util.Date (of which java.sql.Timestamp is a subclass) offers a
> good insight:
>
>         "The class Date represents a specific instant in time"
>
> - This is also indirectly implied by the following sentence in the Oracle
> document:
>
>         "Two TIMESTAMP WITH TIME ZONE values are considered identical if
> they represent the same instant in UTC, regardless of the TIME ZONE offsets
> stored in the data."
>
> - The 'timestamp with time zone' is what I think should be mapped to the
> Java type 'java.sql.Timezone' and what also corresponds to the actual
> meaning of the Kudu type UNIXTIME_MICROS
>
> - This is the kind of timestamp type that is most useful to businesses
> (financial, healthcase, security logs) because it captures the "absolute"
> moment in time a transaction or an event occurred, regardless of which
> timezone it occurred, or if it was daylight saving time or not
>
>
> In conclusion the point I am at right now is that to really fix the issue
> in Impala, two things should occur:
>
> - a new data type ('timestamp with time zone'), with the features
> mentioned above, should be implemented - this change would hopefully not
> break any of the existing applications
>
> - the current 'timestamp' type should be mapped to the Java class
> java.time.LocalDateTime (not java.sql.Timestamp) - this would definitely
> break existing applications, and therefore should be postponed to Impala 3.0
>
>
> Regards,
> Franco
>
>
> ------------------------------
> *From: *"Jeszy" <jeszyb@gmail.com>
> *To: *user@impala.apache.org
> *Sent: *Monday, December 18, 2017 7:49:52 AM
> *Subject: *Re: Impala, Kudu, and timestamps (and UNIXTIME_MICROS...)
>
> Hello Franco,
>
> Thanks for your feedback! I agree there are pain points with using
> timestamps, especially together with other systems.
> Is there any particular approach or solution you propose that would
> work well for you? Have you found any jiras on issues.apache.org that
> describe what you're asking for? Commenting on a jira will help the
> team track your input better.
>
> Regards,
> Balazs
>
> On 17 December 2017 at 00:38, Franco Venturi <fventuri@comcast.net> wrote:
> > Please note that the discussion below refers to the following versions:
> >       - Impala: v2.10.0-cdh5.13.0
> >       - Kudu: 1.5.0-cdh5.13.0
> >       - Everything runs on a standard Cloudera 5.13 installation
> >
> >
> > A few days ago I was writing some Java code to migrate several tables
> > directly from Oracle to Kudu (to be queried later on by our developers
> and
> > BI tools using Impala). Most of these tables have columns that are of
> type
> > "timestamp" (to be exact, they come in as instances of class
> > oracle.sql.TIMESTAMP and I cast them to java.sql.Timestamp; for the rest
> of
> > this discussion I'll assume we only deal with objects of
> java.sql.Timestamp,
> > to make things simple).
> > As you probably know, Kudu, starting I think with version 1.3.1, has a
> type
> > called 'UNIXTIME_MICROS') and that type gets mapped by Impala as "Impala
> > TIMESTAMP" data type
> > (https://www.cloudera.com/documentation/enterprise/latest/topics/impala_
> timestamp.html).
> >
> >
> > A good description of the meaning of 'UNIXTIME_MICROS' in Kudu is in the
> > 'Apache Kudu Schema Design' document
> > (https://kudu.apache.org/docs/schema_design.html), which says:
> >
> >
> >       unixtime_micros (64-bit microseconds since the Unix epoch)
> >
> >
> > where the 'Unix epoch' is defined as 1/1/1970 00:00:00 GMT.
> >
> >
> > With this understanding I went ahead and wrote my Java code; when I ran
> the
> > first few tests, I noticed that the timestamp values returned by Impala
> (I
> > created in Impala an 'external' table 'stored as kudu') were off by
> several
> > hours compared to the values returned by the original table in Oracle
> (our
> > servers, both the Oracle ones and the Impala/Kudu ones, are all
> configured
> > in the 'America/New_York' timezone).
> >
> >
> > To investigate this difference, I created a simple table in Kudu with
> just
> > two columns, an INT64 as the primary key and a UNIXTIME_MICROS as a
> > timestamp. I ran a few inserts and selects over this table in Impala and
> > figured out that Impala stores a value that is more or less defined as
> > follow:
> >
> >
> >       number of microseconds since the Unix epoch (i.e. what I was
> expecting
> > originally)
> >       + offset of the timestamp I inserted with respect to GMT (in my
> case
> > this offset is the offset for EST or EDT depending if that timestamp was
> > during EST (winter) or EDT (summer))
> >
> >
> > This is how Impala achieves what is described as:
> >
> >
> >       Impala does not store timestamps using the local timezone, to avoid
> > undesired results from unexpected time zone issues
> >
> >
> > That same page has caveats like the following, that sent a shiver down my
> > spine:
> >
> >
> >       If that value was written to a data file, and shipped off to a
> distant
> > server to be analyzed alongside other data from far-flung locations, the
> > dates and times would not match up precisely because of time zone
> > differences
> >
> >
> > This means that if anyone is using (or even thinking about using) "Impala
> > timestamps" to say store financial or health services (or security)
> events,
> > they'll find some nasty "surprises" (even if they don't plan to ever move
> > their servers and only do business in one timezone).
> >
> >
> > Consider for instance the case of anything that occurred between 1am and
> 2am
> > EDT on 11/5/2017 (i.e. in the hour before we moved our clocks back from
> EDT
> > to EST) - there's no way to store the timestamps for these events in Kudu
> > via Impala.
> >
> > To prove this I wrote this simple piece of Java code (which uses Java 8
> and
> > all well documented and non-deprecated classes and methods) to do just an
> > insert and a select via Impala JDBC of a timestamp row in the simple
> table
> > that I mentioned above (primary key + timestamp column):
> >
> >
> >
> >       // run insert
> >       long primaryKey = 1L;
> >       PreparedStatement insert = connection.prepareStatement("insert
> into "
> > + table + " values (?, ?)");
> >       insert.setLong(1, primaryKey);
> >       Timestamp timestampIn = new Timestamp(1509859800000L);
> >       System.out.println("TimestampIn: " + timestampIn + " - getTime():
> " +
> > timestampIn.getTime());
> >       insert.setTimestamp(2, timestampIn);
> >       insert.executeUpdate();
> >       insert.close();
> >
> >
> >       // run select
> >       PreparedStatement select = connection.prepareStatement("select " +
> > timestampColumn + " from " + table + " where " + primaryKeyColumn +
> "=?");
> >       select.setLong(1, primaryKey);
> >       ResultSet resultSet = select.executeQuery();
> >       while (resultSet.next()) {
> >           Timestamp timestampOut = resultSet.getTimestamp(1);
> >           System.out.println("TimestampOut: " + timestampOut + " -
> > getTime(): " + timestampOut.getTime());
> >       }
> >       resultSet.close();
> >       select.close();
> >
> >
> >
> > and this is the output:
> >
> >
> >      TimestampIn: 2017-11-05 01:30:00.0 - getTime(): 1509859800000
> >      TimestampOut: 2017-11-05 01:30:00.0 - getTime(): 1509863400000
> >
> >
> >
> > If this kind of timestamp had been designed say in 1972, I would have
> > probably been OK with it, but in 2017, when people and companies do
> business
> > 24/7, I find this approach (and sentences like "dates and times would not
> > match up precisely") unacceptable.
> >
> >
> >
> > To its defense, the Impala TIMESTAMP web page mentioned above spends
> several
> > paragraphs about how to deal with "UTC timestamps" and problems like the
> > ones shown above, but it basically has the user wrap everything in Impala
> > functions like 'to_utc_timestamp()' and 'from_utc_timestamp()'.
> >
> >
> > This proposed solution - besides the performance impact of say having a
> > query like this:
> >
> >
> >       select ... where to_utc_timestamp(timestamp_column_a) <
> > to_utc_timestamp(timestamp_column_b)
> >
> >
> > makes also much harder to write queries having timestamp fields (which is
> > very common in my workplace) or to use it with automated BI tools.
> >
> >
> >
> > It is definitely not what one would expect from an analytic MPP database,
> > that aspires to be used at the Enterprise level in 2017.
> >
> >
> >
> > Regards,
> > Franco Venturi
>
>

--f403045e9cf001c00e05628d5a4d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Franco-<div><br></div><div>Are there still outstanding que=
stions or issues with this?=C2=A0 I&#39;m just returning from holiday so wa=
nted to check in. =C2=A0</div><div><br></div><div>Agreed that timestamp w/ =
and w/o time zone are different beasts and come with their own challenges.=
=C2=A0</div><div><br></div><div>Cheers,</div><div><br><div class=3D"gmail_e=
xtra"><br><div class=3D"gmail_quote">On Tue, Dec 19, 2017 at 6:23 AM, Franc=
o Venturi <span dir=3D"ltr">&lt;<a href=3D"mailto:fventuri@comcast.net" tar=
get=3D"_blank">fventuri@comcast.net</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><div><div style=3D"font-family:Arial;font-size:12pt;color:=
#000000"><div>Thanks for your input Belazs.<br>I am still pondering about t=
his issue and I found very useful to follow your advice on seeing what&#39;=
s already in JIRA and doing some research on what others are doing.<br><br>=
I didn&#39;t find anything that exactly describes my problem on the Impala =
JIRA (I might have missed it), however I did find a closely related discuss=
ion that is occurring in the Hive JIRA: I am referring to JIRAs HIVE-14305 =
(<a href=3D"https://issues.apache.org/jira/browse/HIVE-14305" target=3D"_bl=
ank">https://issues.apache.org/<wbr>jira/browse/HIVE-14305</a>), HIVE-14412=
 (<a href=3D"https://issues.apache.org/jira/browse/HIVE-14412" target=3D"_b=
lank">https://issues.apache.org/<wbr>jira/browse/HIVE-14412</a>), and HIVE-=
16418 (<a href=3D"https://issues.apache.org/jira/browse/HIVE-16418" target=
=3D"_blank">https://issues.apache.org/<wbr>jira/browse/HIVE-16418</a>).<br>=
<br>I found also useful to read how two of the most used RDBMS are dealing =
with timestamps:<br>=C2=A0=C2=A0 - Oracle (see: <a href=3D"https://docs.ora=
cle.com/en/database/oracle/oracle-database/12.2/nlspg/datetime-data-types-a=
nd-time-zone-support.html" target=3D"_blank">https://docs.oracle.com/en/<wb=
r>database/oracle/oracle-<wbr>database/12.2/nlspg/datetime-<wbr>data-types-=
and-time-zone-<wbr>support.html</a>)<br>=C2=A0=C2=A0 - PostgreSQL (see: <a =
href=3D"https://www.postgresql.org/docs/10/static/datatype-datetime.html" t=
arget=3D"_blank">https://www.postgresql.org/<wbr>docs/10/static/datatype-<w=
br>datetime.html</a>)<br><br>Finally I am going to refer to the Java 8 API =
as described here: <a href=3D"https://docs.oracle.com/javase/8/docs/api/ove=
rview-summary.html" target=3D"_blank">https://docs.oracle.com/<wbr>javase/8=
/docs/api/overview-<wbr>summary.html</a>; the reason is that a lot of Impal=
a users (me included) connect to it via Java/JDBC and it is very important =
in my opinion that the Impala data types are correctly mapped to the correc=
t Java classes to avoid confusion.<br><br><br>After this long preamble here=
 are some notes that hopefully will help in this discussion.<br><br>- At th=
e center of this issue is that there are two significantly different data t=
ypes that we commonly refer to when we use the word &#39;timestamp&#39;:<br=
>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - timestamp without time zone (most commonl=
y referred to as just &#39;timestamp&#39;)<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 - timestamp with time zone<br><br>- These two types are like apples and=
 oranges in my opinion; getting confused between the two (like I did) can c=
ause a lot of frustration (my first post in this thread shows that pretty w=
ell..)<br><br>- The first data type (&#39;timestamp without time zone&#39; =
or just &#39;timestamp&#39;) is best described in this sentence in section =
4.2.1.6 (Choosing a TIMESTAMP Data Type) in the Oracle document:<br><br>=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &quot;Use the TIMESTAMP data type w=
hen you need a datetime value to record the time of an event without the ti=
me zone. For example, you can store information about the times when worker=
s punch a time card in and out of their assembly line workstations. Because=
 this is always a local time it is then not needed to store the timezone pa=
rt&quot;<br><br>- I think this is the kind of timestamp that is currently i=
mplemented in Impala (please correct me if I am wrong) and in my opinion it=
 should be mapped to something like the Java type java.time.LocalDateTime; =
the Java 8 API for java.time.LocalDateTime has this sentence that I think p=
rovides a good insight on the meaning of this data type:<br><br>=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &quot;This class does not store or repres=
ent a time-zone. Instead, it is a description of the date, as used for birt=
hdays, combined with the local time as seen on a wall clock. It cannot repr=
esent an instant on the time-line without additional information such as an=
 offset or time-zone.&quot;<br><br>- Also this timestamp (again in my opini=
on) should not be mapped to the Kudu type UNIXTIME_MICROS, because their me=
aning is different (and this is what triggered my initial confusion, and I =
suspect I am not the only one out there, who misunderstood this difference)=
<br><br><br>- For the second data type (&#39;timestamp with time zone&#39;)=
 I couldn&#39;t find a good definition in the Oracle document, however this=
 sentence from the Java API for java.util.Date (of which java.sql.Timestamp=
 is a subclass) offers a good insight:<br><br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 &quot;The class Date represents a specific instant in time&=
quot;<br><br>- This is also indirectly implied by the following sentence in=
 the Oracle document:<br><br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &qu=
ot;Two TIMESTAMP WITH TIME ZONE values are considered identical if they rep=
resent the same instant in UTC, regardless of the TIME ZONE offsets stored =
in the data.&quot;<br><br>- The &#39;timestamp with time zone&#39; is what =
I think should be mapped to the Java type &#39;java.sql.Timezone&#39; and w=
hat also corresponds to the actual meaning of the Kudu type UNIXTIME_MICROS=
</div><div><br></div><div>- This is the kind of timestamp type that is most=
 useful to businesses (financial, healthcase, security logs) because it cap=
tures the &quot;absolute&quot; moment in time a transaction or an event occ=
urred, regardless of which timezone it occurred, or if it was daylight savi=
ng time or not<br><br><br>In conclusion the point I am at right now is that=
 to really fix the issue in Impala, two things should occur:<br><br>- a new=
 data type (&#39;timestamp with time zone&#39;), with the features mentione=
d above, should be implemented - this change would hopefully not break any =
of the existing applications<br><br>- the current &#39;timestamp&#39; type =
should be mapped to the Java class java.time.LocalDateTime (not java.sql.Ti=
mestamp) - this would definitely break existing applications, and therefore=
 should be postponed to Impala 3.0<br><br><br>Regards,<br>Franco<br><br></d=
iv><div><br></div><hr id=3D"m_-6967663069556595233zwchr"><div style=3D"colo=
r:#000;font-weight:normal;font-style:normal;text-decoration:none;font-famil=
y:Helvetica,Arial,sans-serif;font-size:12pt"><b>From: </b>&quot;Jeszy&quot;=
 &lt;<a href=3D"mailto:jeszyb@gmail.com" target=3D"_blank">jeszyb@gmail.com=
</a>&gt;<br><b>To: </b><a href=3D"mailto:user@impala.apache.org" target=3D"=
_blank">user@impala.apache.org</a><br><b>Sent: </b>Monday, December 18, 201=
7 7:49:52 AM<span class=3D""><br><b>Subject: </b>Re: Impala, Kudu, and time=
stamps (and UNIXTIME_MICROS...)<br><div><br></div></span><div><div class=3D=
"h5">Hello Franco,<br><div><br></div>Thanks for your feedback! I agree ther=
e are pain points with using<br>timestamps, especially together with other =
systems.<br>Is there any particular approach or solution you propose that w=
ould<br>work well for you? Have you found any jiras on <a href=3D"http://is=
sues.apache.org" target=3D"_blank">issues.apache.org</a> that<br>describe w=
hat you&#39;re asking for? Commenting on a jira will help the<br>team track=
 your input better.<br><div><br></div>Regards,<br>Balazs<br><div><br></div>=
On 17 December 2017 at 00:38, Franco Venturi &lt;<a href=3D"mailto:fventuri=
@comcast.net" target=3D"_blank">fventuri@comcast.net</a>&gt; wrote:<br>&gt;=
 Please note that the discussion below refers to the following versions:<br=
>&gt; =C2=A0 =C2=A0 =C2=A0 - Impala: v2.10.0-cdh5.13.0<br>&gt; =C2=A0 =C2=
=A0 =C2=A0 - Kudu: 1.5.0-cdh5.13.0<br>&gt; =C2=A0 =C2=A0 =C2=A0 - Everythin=
g runs on a standard Cloudera 5.13 installation<br>&gt;<br>&gt;<br>&gt; A f=
ew days ago I was writing some Java code to migrate several tables<br>&gt; =
directly from Oracle to Kudu (to be queried later on by our developers and<=
br>&gt; BI tools using Impala). Most of these tables have columns that are =
of type<br>&gt; &quot;timestamp&quot; (to be exact, they come in as instanc=
es of class<br>&gt; oracle.sql.TIMESTAMP and I cast them to java.sql.Timest=
amp; for the rest of<br>&gt; this discussion I&#39;ll assume we only deal w=
ith objects of java.sql.Timestamp,<br>&gt; to make things simple).<br>&gt; =
As you probably know, Kudu, starting I think with version 1.3.1, has a type=
<br>&gt; called &#39;UNIXTIME_MICROS&#39;) and that type gets mapped by Imp=
ala as &quot;Impala<br>&gt; TIMESTAMP&quot; data type<br>&gt; (<a href=3D"h=
ttps://www.cloudera.com/documentation/enterprise/latest/topics/impala_times=
tamp.html" target=3D"_blank">https://www.cloudera.com/<wbr>documentation/en=
terprise/<wbr>latest/topics/impala_<wbr>timestamp.html</a>).<br>&gt;<br>&gt=
;<br>&gt; A good description of the meaning of &#39;UNIXTIME_MICROS&#39; in=
 Kudu is in the<br>&gt; &#39;Apache Kudu Schema Design&#39; document<br>&gt=
; (<a href=3D"https://kudu.apache.org/docs/schema_design.html" target=3D"_b=
lank">https://kudu.apache.org/docs/<wbr>schema_design.html</a>), which says=
:<br>&gt;<br>&gt;<br>&gt; =C2=A0 =C2=A0 =C2=A0 unixtime_micros (64-bit micr=
oseconds since the Unix epoch)<br>&gt;<br>&gt;<br>&gt; where the &#39;Unix =
epoch&#39; is defined as 1/1/1970 00:00:00 GMT.<br>&gt;<br>&gt;<br>&gt; Wit=
h this understanding I went ahead and wrote my Java code; when I ran the<br=
>&gt; first few tests, I noticed that the timestamp values returned by Impa=
la (I<br>&gt; created in Impala an &#39;external&#39; table &#39;stored as =
kudu&#39;) were off by several<br>&gt; hours compared to the values returne=
d by the original table in Oracle (our<br>&gt; servers, both the Oracle one=
s and the Impala/Kudu ones, are all configured<br>&gt; in the &#39;America/=
New_York&#39; timezone).<br>&gt;<br>&gt;<br>&gt; To investigate this differ=
ence, I created a simple table in Kudu with just<br>&gt; two columns, an IN=
T64 as the primary key and a UNIXTIME_MICROS as a<br>&gt; timestamp. I ran =
a few inserts and selects over this table in Impala and<br>&gt; figured out=
 that Impala stores a value that is more or less defined as<br>&gt; follow:=
<br>&gt;<br>&gt;<br>&gt; =C2=A0 =C2=A0 =C2=A0 number of microseconds since =
the Unix epoch (i.e. what I was expecting<br>&gt; originally)<br>&gt; =C2=
=A0 =C2=A0 =C2=A0 + offset of the timestamp I inserted with respect to GMT =
(in my case<br>&gt; this offset is the offset for EST or EDT depending if t=
hat timestamp was<br>&gt; during EST (winter) or EDT (summer))<br>&gt;<br>&=
gt;<br>&gt; This is how Impala achieves what is described as:<br>&gt;<br>&g=
t;<br>&gt; =C2=A0 =C2=A0 =C2=A0 Impala does not store timestamps using the =
local timezone, to avoid<br>&gt; undesired results from unexpected time zon=
e issues<br>&gt;<br>&gt;<br>&gt; That same page has caveats like the follow=
ing, that sent a shiver down my<br>&gt; spine:<br>&gt;<br>&gt;<br>&gt; =C2=
=A0 =C2=A0 =C2=A0 If that value was written to a data file, and shipped off=
 to a distant<br>&gt; server to be analyzed alongside other data from far-f=
lung locations, the<br>&gt; dates and times would not match up precisely be=
cause of time zone<br>&gt; differences<br>&gt;<br>&gt;<br>&gt; This means t=
hat if anyone is using (or even thinking about using) &quot;Impala<br>&gt; =
timestamps&quot; to say store financial or health services (or security) ev=
ents,<br>&gt; they&#39;ll find some nasty &quot;surprises&quot; (even if th=
ey don&#39;t plan to ever move<br>&gt; their servers and only do business i=
n one timezone).<br>&gt;<br>&gt;<br>&gt; Consider for instance the case of =
anything that occurred between 1am and 2am<br>&gt; EDT on 11/5/2017 (i.e. i=
n the hour before we moved our clocks back from EDT<br>&gt; to EST) - there=
&#39;s no way to store the timestamps for these events in Kudu<br>&gt; via =
Impala.<br>&gt;<br>&gt; To prove this I wrote this simple piece of Java cod=
e (which uses Java 8 and<br>&gt; all well documented and non-deprecated cla=
sses and methods) to do just an<br>&gt; insert and a select via Impala JDBC=
 of a timestamp row in the simple table<br>&gt; that I mentioned above (pri=
mary key + timestamp column):<br>&gt;<br>&gt;<br>&gt;<br>&gt; =C2=A0 =C2=A0=
 =C2=A0 // run insert<br>&gt; =C2=A0 =C2=A0 =C2=A0 long primaryKey =3D 1L;<=
br>&gt; =C2=A0 =C2=A0 =C2=A0 PreparedStatement insert =3D connection.prepar=
eStatement(&quot;<wbr>insert into &quot;<br>&gt; + table + &quot; values (?=
, ?)&quot;);<br>&gt; =C2=A0 =C2=A0 =C2=A0 insert.setLong(1, primaryKey);<br=
>&gt; =C2=A0 =C2=A0 =C2=A0 Timestamp timestampIn =3D new Timestamp(15098598=
00000L);<br>&gt; =C2=A0 =C2=A0 =C2=A0 System.out.println(&quot;<wbr>Timesta=
mpIn: &quot; + timestampIn + &quot; - getTime(): &quot; +<br>&gt; timestamp=
In.getTime());<br>&gt; =C2=A0 =C2=A0 =C2=A0 insert.setTimestamp(2, timestam=
pIn);<br>&gt; =C2=A0 =C2=A0 =C2=A0 insert.executeUpdate();<br>&gt; =C2=A0 =
=C2=A0 =C2=A0 insert.close();<br>&gt;<br>&gt;<br>&gt; =C2=A0 =C2=A0 =C2=A0 =
// run select<br>&gt; =C2=A0 =C2=A0 =C2=A0 PreparedStatement select =3D con=
nection.prepareStatement(&quot;<wbr>select &quot; +<br>&gt; timestampColumn=
 + &quot; from &quot; + table + &quot; where &quot; + primaryKeyColumn + &q=
uot;=3D?&quot;);<br>&gt; =C2=A0 =C2=A0 =C2=A0 select.setLong(1, primaryKey)=
;<br>&gt; =C2=A0 =C2=A0 =C2=A0 ResultSet resultSet =3D select.executeQuery(=
);<br>&gt; =C2=A0 =C2=A0 =C2=A0 while (resultSet.next()) {<br>&gt; =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Timestamp timestampOut =3D resultSet.getTimesta=
mp(1);<br>&gt; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 System.out.println(&quot;=
<wbr>TimestampOut: &quot; + timestampOut + &quot; -<br>&gt; getTime(): &quo=
t; + timestampOut.getTime());<br>&gt; =C2=A0 =C2=A0 =C2=A0 }<br>&gt; =C2=A0=
 =C2=A0 =C2=A0 resultSet.close();<br>&gt; =C2=A0 =C2=A0 =C2=A0 select.close=
();<br>&gt;<br>&gt;<br>&gt;<br>&gt; and this is the output:<br>&gt;<br>&gt;=
<br>&gt; =C2=A0 =C2=A0 =C2=A0TimestampIn: 2017-11-05 01:30:00.0 - getTime()=
: 1509859800000<br>&gt; =C2=A0 =C2=A0 =C2=A0TimestampOut: 2017-11-05 01:30:=
00.0 - getTime(): 1509863400000<br>&gt;<br>&gt;<br>&gt;<br>&gt; If this kin=
d of timestamp had been designed say in 1972, I would have<br>&gt; probably=
 been OK with it, but in 2017, when people and companies do business<br>&gt=
; 24/7, I find this approach (and sentences like &quot;dates and times woul=
d not<br>&gt; match up precisely&quot;) unacceptable.<br>&gt;<br>&gt;<br>&g=
t;<br>&gt; To its defense, the Impala TIMESTAMP web page mentioned above sp=
ends several<br>&gt; paragraphs about how to deal with &quot;UTC timestamps=
&quot; and problems like the<br>&gt; ones shown above, but it basically has=
 the user wrap everything in Impala<br>&gt; functions like &#39;to_utc_time=
stamp()&#39; and &#39;from_utc_timestamp()&#39;.<br>&gt;<br>&gt;<br>&gt; Th=
is proposed solution - besides the performance impact of say having a<br>&g=
t; query like this:<br>&gt;<br>&gt;<br>&gt; =C2=A0 =C2=A0 =C2=A0 select ...=
 where to_utc_timestamp(timestamp_<wbr>column_a) &lt;<br>&gt; to_utc_timest=
amp(timestamp_<wbr>column_b)<br>&gt;<br>&gt;<br>&gt; makes also much harder=
 to write queries having timestamp fields (which is<br>&gt; very common in =
my workplace) or to use it with automated BI tools.<br>&gt;<br>&gt;<br>&gt;=
<br>&gt; It is definitely not what one would expect from an analytic MPP da=
tabase,<br>&gt; that aspires to be used at the Enterprise level in 2017.<br=
>&gt;<br>&gt;<br>&gt;<br>&gt; Regards,<br>&gt; Franco Venturi<br></div></di=
v></div><div><br></div></div></div></blockquote></div><br></div></div></div=
>

--f403045e9cf001c00e05628d5a4d--