impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-1903: Add support for partitioning by TIMESTAMP
Date Thu, 10 Mar 2016 18:06:09 GMT
Jim Apple has posted comments on this change.

Change subject: IMPALA-1903: Add support for partitioning by TIMESTAMP
......................................................................


Patch Set 3:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 235: 
> nit: why the extra empty line?
Done


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/expr.cc
File be/src/exprs/expr.cc:

Line 209: else {
        :         *expr = pool->Add(new NullLiteral(texpr_node));
        :       }
> Are we sure about that? If we can't parse it we convert it to null? Shouldn
Some Hive timestamps are invalid Impala timestamps, like ones before 1400AD. Returning non-ok
status causes:

    I0307 14:20:14.831833 19548 status.cc:112] Invalid timestamp literal 1399-12-31 23:59:59.999999999
        @          0x10bfaa7  impala::Status::Status()
        @          0x168e7cd  impala::Expr::CreateExpr()
        @          0x168dff6  impala::Expr::CreateTreeFromThrift()
        @          0x168e2dd  impala::Expr::CreateTreeFromThrift()
        @          0x168db69  impala::Expr::CreateExprTree()
        @          0x134d075  Java_com_cloudera_impala_service_FeSupport_NativeEvalConstExprs
        @     0x7fdebdfd0c39  (unknown)
    E0307 14:20:14.831908 19548 expr.cc:148] Could not construct expr tree.
    Invalid timestamp literal 1399-12-31 23:59:59.999999999

in

    select count(*) from hive_timestamp_partitions.t where ts is null;


in
    query_test/test_partitioning.py::TestPartitioning::test_hive_timestamp_partitions

Dimitris and I talked about allowing syntactically valid timestamps from Hive that are semantically
invalid ones in Impala, like ones from 1399 or before, while disallowing complete garbage,
like "THIS IS NOT A TIMESTAMP". By "allowing" those Hive timestamps, I mean turning them into
NULL values, and by "disallowing", I mean returning an error to the user.

While this makes some sense to me, in that we prevent a user from accidentally making some
NULL timestamp partitions, it is not completely consistent with how we handle invalid timestamps
in non-partition columns, which looks like:

    WARNINGS: Error converting column: 0 TO TIMESTAMP (Data is: 1399-10-10 01:01:01)

(and then treating the value as NULL)

Maybe it's better to warn here?


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/exprs/literal.cc
File be/src/exprs/literal.cc:

Line 202: DCHECK(
        :       type.type == TYPE_STRING || type.type == TYPE_CHAR || type.type == TYPE_TIMESTAMP)
        :       << type;
> formatting nit: I think you can condense them in two lines
Done


Line 254: TYPE_TIMESTAMP:
> Why don't we do the DCHECK(Parse...) for timestamps?
Done.


http://gerrit.cloudera.org:8080/#/c/1621/3/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

Line 358: CheckParse
> This function doesn't really check anything, correct? You're just overloadi
Kind of. Parse actually returns a result by modifying parameters 3 and 4. This just returns
true if s is parsable. I've added a comment.

Thoughts on the naming? Maybe "IsParsable"?


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/LiteralExpr.java:

Line 126: new TimestampLiteral(exprNode.timestamp_literal.value);
> This is supposed to return an analyzed literal and I don't think your const
Yes, that analysis all happens in the backend.


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java
File fe/src/main/java/com/cloudera/impala/analysis/TimestampLiteral.java:

Line 1: 5
> 6 :)
Done


Line 29: TimestampLiteral
> Don't you need to analyze TimestampLiteral? Where do you verify that the st
In canEvalUsingPartitionMd, we don't eval timestamps for validity in the frontend.


Line 84: public int compareTo(LiteralExpr o) {
> Is it correct to enforce ordering of timestamp literals based on their stri
I think so, because partition pruning uses the backend. In particular, some of the tests test
ordering, and timestamps in partitions seem to behave the same as timestamps in other columns.


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionPruner.java:

Line 182: slot.getType().isTimestamp() || bindingExpr.getType().isTimestamp()
> Why do you need both conditions?
It looks to me like BinaryPredicates can compare different types. If that's so, then I think
the answer to your question is yes.


http://gerrit.cloudera.org:8080/#/c/1621/3/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

Line 571: 
> Do we have test table with timestamp column?
alltypes has a timestamp column. I don't think I understand what you're getting at, though.


http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test
File testdata/workloads/functional-query/queries/QueryTest/partition-col-types.test:

Line 60: timestamp_col=__HIVE_DEFAULT_PARTITION__
> Is this hive's behavior as well? i.e. add a null partition if the timestamp
Hive does not allow the partition to be added.

Do we want to mimic that? If so, we will have an asymmetry between what partitions we can
add (no null values) and what partitions we can read (null partition values are fine).


Line 180: ====
> Can you also add a couple of more test cases:
Done


http://gerrit.cloudera.org:8080/#/c/1621/3/testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test
File testdata/workloads/functional-query/queries/QueryTest/timestamp-partitions.test:

Line 59: BETWEEN is supported
> Add also an IN predicate
Done


Line 214: ---- QUERY
> -- A 'T' character can be placed between a date and a time ...
Done


Line 247: are syntactically between two dates
> I am not sure I understand what that means exactly.
Added explanation


Line 270: -- All bare times (without dates) are greater than all dates
        : select ts, ts >= '00:00:00' from t
        : where ts >= '00:00:00'
        : order by ts
> Why don't you just sort the rows in t by ts to demonstrate the applied orde
I added a test using order by, as well, to ensure that where >= '00:00:00' and ORDER BY
have similar results.


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/common/beeline.py
File tests/common/beeline.py:

Line 20: class Beeline(object):
> Can you add a class comment?
Done


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

Line 102: _
> Is this a placeholder of some sort?
This is to make HiveDbWrapper.__init__ and ImalaDbWrapper.__init__ have the same type, so
that lines like 290 work.


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

Line 196: 1987-05-29 15:45:44.6
> Timestamp value does not seem to be consistent with the comment above.
Which one? The two paths in the comment are intentionally different.


Line 208: # Create two paths '/i=0004/p=p2/t=06%3A07%3A08%3A.0' and
        :     # "i=000004/p=p2/t=06%3A07%3A08%3A.00"
> Can we have a bit more pronounced difference? e.g. t1=2010-01-01 and t2=201
I've added more zeros, but I've kept the time-only nature of the values to ensure that case
works.


Line 302: 
> nit: extra empty line
Done


http://gerrit.cloudera.org:8080/#/c/1621/3/tests/query_test/test_partitioning.py
File tests/query_test/test_partitioning.py:

Line 78: test_hive_timestamp_partitions
> Brief comment on what you're testing here. Also, shouldn't this test be par
Added comment.

As far as test location, my feeling is that the test belongs here because it is specifically
about timestamp partitions, while the only other test with 'integration' in its name I see
is metadata/test_hms_integration.py.


Line 89:  
> extra space?
Done


Line 92:  
> extra space?
Done


Line 96: assert ('1' ==
> Before these assertions may do a "show partitions" to demonstrate how many 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/1621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icad7dcdc1b199cce9483dc414072bbe24efd625c
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message