impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Mon, 24 Apr 2017 12:48:02 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet

Patch Set 8:

Commit Message:

PS7, Line 22: New HDFS tables created by Impala using CREATE TABLE and CREATE TABLE
            : LIKE <file> will set the table property to UTC if the global flag
            : --set_parque
> I don't follow this given the next two paragraphs. This isn't true for the 
Correct. Changed the commit message.
File be/src/exec/hdfs-scan-node-base.h:

Line 305:   // parquet-mr. If conversion should not occur, this is set to an empty string.
> from the code, it looks like this is set to the empty string if conversion 
Correct, added a comment to document the behavior.
File be/src/exec/

Line 234:             FLAGS_convert_legacy_hive_parquet_utc_timestamps
> given that parquet_mr_write_zone() will take precedence over the flag, I th

PS7, Line 246: DCHECK
> nit: DCHECK_EQ()

PS7, Line 609:  status = scanner_state->LogOrReturnError(msg);
> since this code is very performance critical, it would be best to order thi
Correct. Reordered the conditions in the if-elif-else statement

PS7, Line 611: status = status;
> these could all be UNLIKELY since we want to optimize for the non-error cas

PS7, Line 625: LY(dst->HasDateAndTime()
> i.e. here too and below.

PS7, Line 639:         if (!SetTimestampConversionError(parent_->scan_node_, parent_->state_,
             :             parent_->parse_status_, src, parent_->scan_node_->parquet_mr_write_zone()))
             :           return false;
             :         }
             :       }
             :     } else {
             :       DCHECK(parent_->scan_node_->parquet_mr_write_zone().empty());
             :       DCH
> seems worth factoring this block (which occurs 3 times) into a non-inlined 
Done. Added a new non-class member function and factored out the block into it.

(Adding a new member function instead to ScalarColumnReader template class is probably not
a good idea since it is only needed in the ScalarColumnReader<TimestampValue, true>
File be/src/exprs/

Line 193: 
> could you add a comment explaining these two lookup techniques?
File be/src/exprs/timezone_db.h:

PS7, Line 45: hen IsTimestampDependen
> how about drawing the connection with the next function directly:

Line 49: 
> comment this.
File be/src/runtime/

Line 101:     Status s("Failed to convert timestamp to local time.");
> this might produce a ton of logging.  instead, why not return the Status ob

Line 109: bool TimestampValue::FromUtc(const std::string& timezone_str) {
> how do we guarantee that this is true? did the caller already check that?
Yes, it did.

PS7, Line 111: _zone_ptr timezo
> UNLIKELY (since perf critical)

Line 123:   ToPtime(&temp);
> see the comment at line 77. i think we were avoiding this for perf reasons.
I've added a micro-benchmark to compare the two. UtcToLocal() is about 2.5 times faster than
FromUtc() when input data size is 1,000 and 1.5 times faster when input data size is 10,000.

Rewriting FromUtc() not to use boost function calls is not that straightforward. The above
version of UtcToLocal() uses localtime_r() standard C function to perform the UTC->local
timezone conversion. FromUtc() on the other hand should convert timestamps to a given time
zone. I don't think there is a standard C function to do this.
File be/src/runtime/timestamp-value.h:

PS7, Line 197: 
> below you say "local time in the given timezone", which I guess is technica
Changed the comment below for 'FromUtc'

PS7, Line 203: 
> it's not clear that "convert" means that *this is modified in-place, so I t
File be/src/service/

PS7, Line 122: new
> is that always the case?
Fixed the description

PS7, Line 122: bles created "
             :     "using CREATE TABLE and CREATE TABLE LIKE <file>. You can find details
in the "
             :     "documentation.");
             : DEFINE_int64(max_result_cache_size, 10000
> I'm not sure that this makes sense on it's own. I think you kind of have to
Removed that part
File common/thrift/BackendGflags.thrift:

PS7, Line 58: new
> is that true?  what about "LIKE table" case?
Fixed the comment

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-Reviewer: Zoltan Ivanfi <>
Gerrit-HasComments: Yes

View raw message