impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Ivanfi (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Tue, 14 Feb 2017 17:06:49 GMT
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Patch Set 2:


Thanks for tackling this issue! I raised a few concerns but in general it looks good.
File be/src/exec/

Line 237:         timezone_ = TimezoneDatabase::FindTimezone(
We should first check for TimezoneDatabase::IsTimestampDependentTimezone, otherwise there
is no point in stroing a timezone_ that we won't use.

Line 240:         timezone_ = NULL;
This exception should not be swallowed. If we can not convert to the desired time zone properly,
we should show an error message to the user.

Line 543:   /// If not NULL, TIMESTAMP column readers require conversion to this time zone.
Although technically correct, this comment is a bit misleading. One may easily interpret the
if as an iff ("if and only if"). I would suggest to document its purpose instead: to cache
the return value of TimezoneDatabase::FindTimezone in order to avoid repeated calls to it.

Line 593:   if (LIKELY(dst->HasDateAndTime())) {
DCHECK(parent_->scan_node_->parquet_mr_time_zone() != "UTC")

Line 597:       dst->FromUtc(parent_->scan_node_->parquet_mr_time_zone());
This is somewhat confusing, as it seems as if the first two cases did the same thing: convert
from UTC to the parent_->scan_node_->parquet_mr_time_zone() with the only exception
that in the second case we use the timezone that we saved into the timezone_ field earlier,
but it's value came from the same parent_->scan_node_->parquet_mr_time_zone() value.

In reality though, the parameter of the first call is a string, while the parameter of the
second one is a timezone, which leads to different behaviors. Since its very hard to notice
this difference, please add comments to describe what's happening.
File be/src/exprs/

Line 189: time_zone_ptr TimezoneDatabase::FindTimezone(const string& tz) {
This should be a private helper function with two callers:

- FindTimezone(const string& tz, const TimestampValue& tv) as you already do it
- FindTimezone(const string& tz) which should first ensure that !TimezoneDatabase::IsTimestampDependentTimezone(tz)
File be/src/runtime/

Line 109:     // If a table has an invalid timezone, it should not have been loaded.
How does this comment apply here? Was it supposed to be before the catch block?

Line 112:     *this = ptime(not_a_date_time);
Again, swallowing this exception can be a problem as end-users may not notice that they don't
see all data correctly. I understand that there is a check higher in the hierarchy, but then
the try-catch is unnecessary. If we have it regardless, we should handle it according to the
desired output: an error presented to the user.
File be/src/runtime/timestamp-value.h:

Line 201:   /// Converts from UTC to local time in the given timezone.
Please document the difference that one can be timestamp-dependent while the other not.
File be/src/service/

Line 352: extern "C"
Please document the purpose of this call (allowing the Java code to check whether a timezone
is valid in the C code).
File fe/src/main/java/org/apache/impala/analysis/

Line 182:             "'' is only supported for HDFS tables."));
Why? Parquet files should be handled this way regardless of using HDFS or not.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-Reviewer: Zoltan Ivanfi <>
Gerrit-HasComments: Yes

View raw message