impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5990: End-to-end compression of metadata
Date Thu, 14 Dec 2017 21:52:37 GMT
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8825
)

Change subject: IMPALA-5990: End-to-end compression of metadata
......................................................................


Patch Set 2:

(10 comments)

Hi,

I've not done a deep review of the protocol changes, but took a look through to understand
new things for myself.

I think it would be useful to write the boring test for the C++ CompressCatalogObject() and
DecompressCatalogObject() functions. We could check that things like the empty string go through
this just fine. (I have no reason to believe they wouldn't.)

We should also decide on our approach to Thrift structs. My biases are to keep them compatible
(i.e., don't re-use numbers, keep around deprecated fields forever), but we should decide
explicitly, and perhaps update the style guide.

http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG@17
PS2, Line 17: - A single catalog object cannot be larger than ~2GB on openjdk 8. It
This is probably just "jdk 8" (rather than Open). A lot of people run Impala using the Oracle
JDK, so it'd be good to know that Oracle and OpenJDK agree here.


http://gerrit.cloudera.org:8080/#/c/8825/2//COMMIT_MSG@23
PS2, Line 23: Testing: Ran existing tests. Manually tested with a 1.95GB catalog object
Was this painful to produce? Could we check in a test that creates a huge object and tries
to force this to happen, or is it prohibitively expensive?


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-server.cc@384
PS2, Line 384:   VLOG(1) << "Publishing " << (deleted ? "deletion " : "update
") << ": " << item.key;
Would logging size (perhaps both compressed and uncompressed) here be useful? The old log
also had a catalog version; I think that's useful to keep.


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.h
File be/src/catalog/catalog-util.h:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.h@41
PS2, Line 41: /// 'catalog_object'. Stores the size of the uncopressed catalog object in the
first
spelling: uncompressed


http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.cc
File be/src/catalog/catalog-util.cc:

http://gerrit.cloudera.org:8080/#/c/8825/2/be/src/catalog/catalog-util.cc@139
PS2, Line 139: Status CompressCatalogObject(const uint8_t* buf, uint32_t size, string* output)
{
Does the asymmetric of using string* for output here, and vector<uint8_t>* output in
decompression bother us?


http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift@28
PS2, Line 28:   1: required i64 callback_ctx
I think this should be "2", to preserve the ordering. We don't explicitly support multiple
versions of these things working together, but it's good to honor Thrift's goals here.


http://gerrit.cloudera.org:8080/#/c/8825/2/common/thrift/CatalogInternalService.thrift@39
PS2, Line 39:   1: required i64 max_catalog_version
Technically we should leave 2 and 3 around as optional, renaming them to _deprecated or some
such.


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@24
PS2, Line 24: import java.util.*;
I think we should discourage wildcard imports. (You can configure your IDE to not do this.)

We don't formally adopt Google's Java style guide, but https://google.github.io/styleguide/javaguide.html#s3.3.1-wildcard-imports
is the reference. It's especially bad in line 46, where nobody knows what might be in org.apache.impala.common.*.


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@162
PS2, Line 162:       LOG.trace((update.first ? "Deleting " : "Adding ") + "item: " + key +
" version: "
I don't know if this is performance sensitive, but slf4j has some mechanism to say:

if (LOG.isEnabled(TRACE)) { ... }

That would avoiding doing the string interpolation if it's not going to just throw it away.

(I didn't look up the syntax for slf4j here.)


http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/util/TByteBuffer.java
File fe/src/main/java/org/apache/impala/util/TByteBuffer.java:

http://gerrit.cloudera.org:8080/#/c/8825/2/fe/src/main/java/org/apache/impala/util/TByteBuffer.java@26
PS2, Line 26:  * ByteBuffer-backed implementation of TTransport.
https://github.com/apache/thrift/blame/master/lib/java/src/org/apache/thrift/transport/TByteBuffer.java
https://issues.apache.org/jira/browse/THRIFT-3432

Looks like Thrift has added this. We've got other work to upgrade to a more recent version.

The implementations are slightly different; let's make sure ours makes sense?



-- 
To view, visit http://gerrit.cloudera.org:8080/8825
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a8819cad734b3a416eef6c954e55b73cc6023ae
Gerrit-Change-Number: 8825
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <twang@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <philip@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <twang@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Dec 2017 21:52:37 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message