impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Date Thu, 16 Mar 2017 00:03:10 GMT
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
Any benchmark numbers? Just curious. Also I think it is good to add them here.


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
Trailing white spaces at multiple places in this file.


Line 45:     set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
Can this be moved outside the loop like JAVA_FE_ARGS


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
extraneous white spaces.


PS1, Line 54: file_name_prefix_idx
Didn't understand what this prefix means till I read HdfsTable.fileNamePrefixes_. I think
it should be pointed out here.


Line 70: root_type FbFileDesc;
What is the significance of this? I'm reading about FlatBuffers for the first time, so pardon
my ignorance. From what I understand, this isn't required in strongly typed systems. Please
correct me if I'm wrong.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 318:     private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
Document these?


PS1, Line 318: REPLICA_HOST_IDX_MASK
Looks like it used for UNMASKing, rename may be?


PS1, Line 357: getHostIdx
how about just making hostIdx_ a short rather than typecasting everytime?


PS1, Line 360: REPLICA_HOST_CACHE_MASK
Shouldn't this be short rather than int? May be it works fine this way, just want to be sure.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315:         Preconditions.checkNotNull(fd);
I don't think this FileDescriptor.create() can ever return null? If yes, worth removing Preconditions
check here and elsewhere.


Line 338:    * Computes the common prefix between 'fileName' and 'prevFileName' and returns
a
I like the idea but I'm wondering if this kind of optimization is a little premature. Especially
given the filenames generated by Impala writes are typically random UUID style strings and
would likely not qualify this optimization. Thoughts? Please correct me if I got this wrong.


Line 420:           prefixCompressFileName(currentFileName, prevFileName);
Also I think it may not be optimal to construct the prefixes this way by comparing the previous
name with the current name. May be we should huffman style encoding that minimizes the no.
of bits across the entire set of file names?


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = EnumSet.of(
May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by making it public?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message