impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values
Date Wed, 31 May 2017 21:57:18 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5338: Fix Kudu timestamp column default values
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6936/2/be/src/exprs/timestamp-functions.h
File be/src/exprs/timestamp-functions.h:

Line 106:   static TimestampVal TimestampFromUnixMicros(FunctionContext* context,
> the other functions don't have a "Timestamp" prefix, leave it out here as w
The difference is that this returns a TimestampVal, those return a StringVal. I was hoping
to differentiate. Does that seem OK or would you still prefer no prefix?


http://gerrit.cloudera.org:8080/#/c/6936/2/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

Line 46:   tm temp_tm = boost::posix_time::to_tm(temp);
> i pointed this out as a potential performance problem and asked you to leav
The problem with that approach [1] is that (t - epoch) is a boost::time_duration and that
class internally stores ticks which are nanosecond-resolution. Therefore you cannot represent
our entire supported date range with a time_duration. i.e. dates will only work within 1677
to 2262, even though our supported range is 1400-9999.

I spent a while looking for a way around this boost issue (see IMPALA-5357 which is the same
underlying issue I think) because the stdlib functions are indeed expensive, but I couldn't
find anything yet.

Even newer versions of boost will still suffer the same issue: time_duration cannot represent
as large durations as the gregorian dates we want to represent.

That said, I will add a comment now that this does have perf impact and we should look for
an alternative solution.


I don't have perf measurements yet specifically for this code because this isn't the bottleneck
at the moment, the partition/sort is -- we've been studying that so far.

1: https://stackoverflow.com/questions/4461586/how-do-i-convert-boostposix-timeptime-to-time-t


http://gerrit.cloudera.org:8080/#/c/6936/2/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 271:               "cast to a default value for a TIMESTAMP column.", defaultValue_.toSql()));
> i'd say "to a TIMESTAMP literal', just to be absolutely clear
Done


Line 274:  
> trailing
Done


Line 296:               "Error converting timestamp default value: " + defaultValue_.toSql(),
e);
> so you're converting an internal exception to an analysis exception here? a
Good point, though in this case I think the exception is actionable: the user specified an
expression that isn't a valid Impala TIMESTAMP. How about I change the wording of this message?
Or what would you prefer I do?


http://gerrit.cloudera.org:8080/#/c/6936/2/fe/src/main/java/org/apache/impala/util/KuduUtil.java
File fe/src/main/java/org/apache/impala/util/KuduUtil.java:

Line 168:    * Returns the actual value of the specified defaultValue literal. Checks that
> i still don't understand what the return types could be (looks like T<>Lite
Ok, hopefully I made this more clear


http://gerrit.cloudera.org:8080/#/c/6936/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 650:   def test_timestamp_default_value(self, cursor):
> can't you do this with a simple .test file? that would be easier to read an
Good point, I'm not sure why this was originally done this way (this just mirrors the existing
code) - I think this was added by Martin or Casey. Would you mind if I moved all of these
to .test files in a separate change so this change doesn't get held up? I filed IMPALA-5406


-- 
To view, visit http://gerrit.cloudera.org:8080/6936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message