impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Mon, 27 Feb 2017 18:24:33 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 237:         timezone_ = TimezoneDatabase::FindTimezone(
> We should first check for TimezoneDatabase::IsTimestampDependentTimezone, o
Done. Also added slot_desc_->type().type == TYPE_TIMESTAMP to the condition.


Line 240:         timezone_ = NULL;
> This exception should not be swallowed. If we can not convert to the desire
Added an error message. 

Even though the program execution should never get to L240, (we deal with invalid timezones
in the frontend in the analyze phase), I feel it is important to be defensive here and display
errors.


Line 543:   /// If not NULL, TIMESTAMP column readers require conversion to this time zone.
> Although technically correct, this comment is a bit misleading. One may eas
Done


Line 593:   if (LIKELY(dst->HasDateAndTime())) {
> DCHECK(parent_->scan_node_->parquet_mr_time_zone() != "UTC")
Done


Line 597:       dst->FromUtc(parent_->scan_node_->parquet_mr_time_zone());
> This is somewhat confusing, as it seems as if the first two cases did the s
Done. I also refactored the if statements to make the logic in 'ConvertSlot()' easier to follow.
Error handling was added as well.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 189: time_zone_ptr TimezoneDatabase::FindTimezone(const string& tz) {
> This should be a private helper function with two callers:
'FindTimezone(const string& tz)' is also called from the Frontend to check if a timezone
string is valid (see 'Java_org_apache_impala_service_FeSupport_NativeCheckIsValidTimeZone()'
in fe-support.cc). 

In this case it is not guaranteed that the timezone is not timestamp dependent.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 109:     // If a table has an invalid timezone, it should not have been loaded.
> How does this comment apply here? Was it supposed to be before the catch bl
Removed the comment as it doesn't make much sense here.


Line 112:     *this = ptime(not_a_date_time);
> Again, swallowing this exception can be a problem as end-users may not noti
Added code to display exception message. Added error-handling code to the caller function
as well.


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 201:   /// Converts from UTC to local time in the given timezone.
> Please document the difference that one can be timestamp-dependent while th
Both versions of 'FromUtc()' will work both with timestamp-independent and timestamp-dependent
timezones.

On the other hand, there is a difference between the two versions of 'TimezoneDatabase::FindTimezone()'
which is documented in timezone_db.h


http://gerrit.cloudera.org:8080/#/c/5939/2/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 352: extern "C"
> Please document the purpose of this call (allowing the Java code to check w
Done


http://gerrit.cloudera.org:8080/#/c/5939/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

Line 182:             "'parquet.mr.int96.write.zone' is only supported for HDFS tables."));
> Why? Parquet files should be handled this way regardless of using HDFS or n
I'm not 100% sure about this, but I think HdfsTable instances are used to represent tables
stored in HDFS or S3 or the local file system. This condition was meant to exclude Kudu/HBase
tables.

Let's leave it like this for now and see what other reviewers think.


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message