impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Ivanfi (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Tue, 14 Feb 2017 17:06:49 GMT
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 2:

(11 comments)

Thanks for tackling this issue! I raised a few concerns but in general it looks good.

http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 237:         timezone_ = TimezoneDatabase::FindTimezone(
We should first check for TimezoneDatabase::IsTimestampDependentTimezone, otherwise there
is no point in stroing a timezone_ that we won't use.


Line 240:         timezone_ = NULL;
This exception should not be swallowed. If we can not convert to the desired time zone properly,
we should show an error message to the user.


Line 543:   /// If not NULL, TIMESTAMP column readers require conversion to this time zone.
Although technically correct, this comment is a bit misleading. One may easily interpret the
if as an iff ("if and only if"). I would suggest to document its purpose instead: to cache
the return value of TimezoneDatabase::FindTimezone in order to avoid repeated calls to it.


Line 593:   if (LIKELY(dst->HasDateAndTime())) {
DCHECK(parent_->scan_node_->parquet_mr_time_zone() != "UTC")


Line 597:       dst->FromUtc(parent_->scan_node_->parquet_mr_time_zone());
This is somewhat confusing, as it seems as if the first two cases did the same thing: convert
from UTC to the parent_->scan_node_->parquet_mr_time_zone() with the only exception
that in the second case we use the timezone that we saved into the timezone_ field earlier,
but it's value came from the same parent_->scan_node_->parquet_mr_time_zone() value.

In reality though, the parameter of the first call is a string, while the parameter of the
second one is a timezone, which leads to different behaviors. Since its very hard to notice
this difference, please add comments to describe what's happening.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 189: time_zone_ptr TimezoneDatabase::FindTimezone(const string& tz) {
This should be a private helper function with two callers:

- FindTimezone(const string& tz, const TimestampValue& tv) as you already do it
- FindTimezone(const string& tz) which should first ensure that !TimezoneDatabase::IsTimestampDependentTimezone(tz)


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 109:     // If a table has an invalid timezone, it should not have been loaded.
How does this comment apply here? Was it supposed to be before the catch block?


Line 112:     *this = ptime(not_a_date_time);
Again, swallowing this exception can be a problem as end-users may not notice that they don't
see all data correctly. I understand that there is a check higher in the hierarchy, but then
the try-catch is unnecessary. If we have it regardless, we should handle it according to the
desired output: an error presented to the user.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 201:   /// Converts from UTC to local time in the given timezone.
Please document the difference that one can be timestamp-dependent while the other not.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 352: extern "C"
Please document the purpose of this call (allowing the Java code to check whether a timezone
is valid in the C code).


http://gerrit.cloudera.org:8080/#/c/5939/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 182:             "'parquet.mr.int96.write.zone' is only supported for HDFS tables."));
Why? Parquet files should be handled this way regardless of using HDFS or not.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message