impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
Date Sat, 11 Mar 2017 01:27:22 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet
......................................................................


Patch Set 4:

(14 comments)

Still need to look at tests. Code is getting close.

http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 304:   // Time zone for adjusting timestamp values read from Parquet files written by
single line


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 246:           parent_->state_->LogError(ErrorMsg(TErrorCode::PARQUET_MR_WRITE_ZONE_INVALID,
I don't think logging an error is the right thing to do here. We will scan all files and just
keep appending to the error log, but no useful results can come of a query in this state.

This error is not recoverable, so I think we should terminate this scan and the query.

Consider moving this code into Reset() or better add a new Open() or Init() function that
is called once per column reader from HdfsParquetScanner::CreateColumnReaders(). That way
you can return a Status early and not scan any data.


Line 653:     } else {
As mentioned above, we should terminate the query early when in such a bad state. Imo, there
should be a DCHECK here.


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timestamp-functions.cc
File be/src/exprs/timestamp-functions.cc:

Line 194:   for (const auto& tz_region: tz_region_list_) {
No need to use auto here. const string& is clearer


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/exprs/timezone_db.h
File be/src/exprs/timezone_db.h:

Line 40:   /// If 'tz' is not found in the database, null pointer is returned.
null pointer -> nullptr


Line 46:   /// If 'tz' is not found in the database, null pointer is returned.
null pointer -> nullptr


http://gerrit.cloudera.org:8080/#/c/5939/3/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 125:   return true;
> What do you mean? The documentation doesn't say anything about an exception
Agree, I could not find any indication in the code either.


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 199:   /// conversion was successfull and false otherwise.
Comment on what the state of *this is when false is returned.


Line 202:   /// Converts from UTC to local time in the given timezone. Returns true if conversion
Comment on what the state of *this is when false is returned.


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

Line 357:   const char *tz = env->GetStringUTFChars(timezone, NULL);
const char*


http://gerrit.cloudera.org:8080/#/c/5939/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 122:     "the parquet.mr.int96.write.zone table property to UTC for new tables. This
changes "
consistent placing of space (at line end or at line beginning)


http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

Line 67:   }
> I was copying the code that validates the skip.header.line.count table prop
My apologies, but I was wrong on this one. I think it's unfortunate, but better to check this
for every query. The alternative is to catch the issue during table loading, but Impala has
no way to fix such tables today, because you can only ALTER tables that are fully loaded.
That's an independent issue, but for now, it does seem better to check every time like for
skip.header.linecount


http://gerrit.cloudera.org:8080/#/c/5939/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 184:       String timezone = getTblProperties().get(HdfsTable.TBL_PROP_PARQUET_MR_WRITE_ZONE);
> Correct. I need some help here. How can I check if the table that will be c
You can check getFileFormat() here. The only non-HDFS formats we have are HBase and Kudu.


http://gerrit.cloudera.org:8080/#/c/5939/4/fe/src/main/java/org/apache/impala/service/FeSupport.java
File fe/src/main/java/org/apache/impala/service/FeSupport.java:

Line 87:   // Returns true if timezone String is valid, false otherwise.
valid according to the BE timezone database


-- 
To view, visit http://gerrit.cloudera.org:8080/5939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3f24525ef45a2814f476bdee76655b30081079d6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message