impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Mon, 24 Apr 2017 12:48:02 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 8:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/5939/7//COMMIT_MSG
Commit Message:

PS7, Line 22: New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
            : LIKE <file> will set the table property to UTC if the global flag
            : --set_parque
> I don't follow this given the next two paragraphs. This isn't true for the 
Correct. Changed the commit message.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 305:   // parquet-mr. If conversion should not occur, this is set to an empty string.
Otherwise
> from the code, it looks like this is set to the empty string if conversion 
Correct, added a comment to document the behavior.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 234:             FLAGS_convert_legacy_hive_parquet_utc_timestamps
> given that parquet_mr_write_zone() will take precedence over the flag, I th
Done


PS7, Line 246: DCHECK
> nit: DCHECK_EQ()
Done


PS7, Line 609:  status = scanner_state->LogOrReturnError(msg);
> since this code is very performance critical, it would be best to order thi
Correct. Reordered the conditions in the if-elif-else statement


PS7, Line 611: status = status;
> these could all be UNLIKELY since we want to optimize for the non-error cas
Done


PS7, Line 625: LY(dst->HasDateAndTime()
> i.e. here too and below.
Done


PS7, Line 639:         if (!SetTimestampConversionError(parent_->scan_node_, parent_->state_,
             :             parent_->parse_status_, src, parent_->scan_node_->parquet_mr_write_zone()))
{
             :           return false;
             :         }
             :       }
             :     } else {
             :       DCHECK(parent_->scan_node_->parquet_mr_write_zone().empty());
             :       DCH
> seems worth factoring this block (which occurs 3 times) into a non-inlined 
Done. Added a new non-class member function and factored out the block into it.

(Adding a new member function instead to ScalarColumnReader template class is probably not
a good idea since it is only needed in the ScalarColumnReader<TimestampValue, true>
instance)


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 193: 
> could you add a comment explaining these two lookup techniques?
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

PS7, Line 45: hen IsTimestampDependen
> how about drawing the connection with the next function directly:
Done


Line 49: 
> comment this.
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 101:     Status s("Failed to convert timestamp to local time.");
> this might produce a ton of logging.  instead, why not return the Status ob
Done


Line 109: bool TimestampValue::FromUtc(const std::string& timezone_str) {
> how do we guarantee that this is true? did the caller already check that?
Yes, it did.


PS7, Line 111: _zone_ptr timezo
> UNLIKELY (since perf critical)
Done


Line 123:   ToPtime(&temp);
> see the comment at line 77. i think we were avoiding this for perf reasons.
I've added a micro-benchmark to compare the two. UtcToLocal() is about 2.5 times faster than
FromUtc() when input data size is 1,000 and 1.5 times faster when input data size is 10,000.

Rewriting FromUtc() not to use boost function calls is not that straightforward. The above
version of UtcToLocal() uses localtime_r() standard C function to perform the UTC->local
timezone conversion. FromUtc() on the other hand should convert timestamps to a given time
zone. I don't think there is a standard C function to do this.


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

PS7, Line 197: 
> below you say "local time in the given timezone", which I guess is technica
Changed the comment below for 'FromUtc'


PS7, Line 203: 
> it's not clear that "convert" means that *this is modified in-place, so I t
Done


http://gerrit.cloudera.org:8080/#/c/5939/7/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS7, Line 122: new
> is that always the case?
Fixed the description


PS7, Line 122: bles created "
             :     "using CREATE TABLE and CREATE TABLE LIKE <file>. You can find details
in the "
             :     "documentation.");
             : 
             : DEFINE_int64(max_result_cache_size, 10000
> I'm not sure that this makes sense on it's own. I think you kind of have to
Removed that part


http://gerrit.cloudera.org:8080/#/c/5939/7/common/thrift/BackendGflags.thrift
File common/thrift/BackendGflags.thrift:

PS7, Line 58: new
> is that true?  what about "LIKE table" case?
Fixed the comment


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message