impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Date Fri, 05 May 2017 23:29:35 GMT
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6526/5//COMMIT_MSG
Commit Message:

Line 28: of converting to STRING, so this provides some decent
> I assume an exhaustive run has passed?
Not yet, too hard to do without the kudu changes being in because I have to have a patched
Kudu. That should go in soon though, then I'll update the change w/ a new toolchain build
and submit an exhaustive run.


http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 185:     Tuple* kudu_tuple = const_cast<Tuple*>(reinterpret_cast<const Tuple*>(
> how come this expression changed so much?
Not sure exactly what you mean.

In order to get the timestamp w/ padding, we can't use the regular KuduScanBatch::RowPtr API
(what we had before). Kudu is providing us with this 'direct_data' mechanism (i.e. 'raw mode')
that just returns a const uint8_t* to the beginning of the row batch memory.

I guess there's no reason that this code does the const_cast and reinterpret_cast in a different
order -- if that's what you meant? I can swap their order.


Line 191:     // int64) have 8 bytes of padding following the timestamp. Because this padding
is
> in l149 you're saying it's still a todo...
I had to edit the kudu client to always turn this functionality on because I'm waiting for
a change from David to get in to Kudu. That should happen today, so I'll be able to update
the Kudu client and then set the option on l149.


http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS5, Line 304: E
> also print value?
Done


http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 32: #include "gutil/walltime.h"
> i'm assuming you're including this because you're inlining some function bo
Done


Line 107:   static TimestampValue FromUnixTimeMicros(time_t unix_time, int64_t micros) {
> add a comment
Done


Line 190:     DCHECK(unix_time != NULL);
> switch to nullptr throughout this file
Done


Line 193:     tm temp_tm = boost::posix_time::to_tm(temp);
> what about to_time_t, or does that require a newer version of boost?
I'd prefer to keep this consistent with the ToUnixTime() code (see l224), in case there could
be subtle differences between the conversions. While that may be better for performance, it
should be removed when we can move to a 64-bit timestamp type.


Line 214:     *unix_time_micros = std::min(MAX_UNIXTIME_MICROS, *unix_time_micros);
> should this return false when you go above the max?
Given that this is converting from a TimestampValue which cannot represent timestamps above
the 9999-12-31 23:59:59.999999999, this shouldn't be able to go above the MAX_UNIXTIME_MICROS+1
after the rounding. I can add a DCHECK and a comment.


http://gerrit.cloudera.org:8080/#/c/6526/5/testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test:

Line 1: ====
> hm, this is basically the same as the other new .test file. leave todo to f
added a TODO in the .py file


http://gerrit.cloudera.org:8080/#/c/6526/5/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

Line 6:    ts timestamp)
> no 'null' clause here?
Done


PS5, Line 53: 999999
> so this is truncation behavior?
Yes, that was the intention of this test case. I'll add a bit more to the comment.


Line 183: ---- DML_RESULTS: tdata
> how come this only shows 2 rows?
that's just what this test case includes, see the query above "... where id < 2". More
rows are added in subsequent queries.


Line 323: create table allkeytypes (i1 tinyint, i2 smallint, i3 int, i4 bigint, name string,
> add a timestamp key col here?
I can't yet, timestamps aren't supported in the range partition ddl yet. That requires more
FE work and that'll be a future change.


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidralves@gmail.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message