impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Date Sat, 06 May 2017 05:26:06 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata
......................................................................


Patch Set 5:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 31:   LZ4
> zlib?
Done


Line 38:   offset: long (id: 0);
> why not specify a default, given that most will be 0?
Done


Line 51:   disk_ids: [ushort] (id: 3);
> is_cached?
This is embedded in MSB of replica_host_idxs.


http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
> mention where to find the fb definition
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 78:           diskId = (short) (storageIdGenerator.get(host) + 1);
> at least assert that you don't get an overflow here
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFlatBuffer() {
> should we generally settle on toFb for functions like these? brevity is a v
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 115:       int[] blockOffsets = createBlockMetadata(fbb, blockLocations, hostIndex, stats);
> i couldn't tell from the name that this creates an array of fbs. call it so
Done


Line 126:     public static FileDescriptor createWithSynthesizedBlockMetadata(FileStatus fileStatus,
> feel free to generally abbreviate metadata as md, to make this very long na
Done


PS5, Line 141: blockOffsets
> would also be good to name this so it's immediately clear what it is (ie, e
Done


Line 161:      * Synthesizes the block metadata of a file represented by 'fileStatus' that
resides
> you don't say what that block metadata contains.
Done


Line 182:         List<BlockReplica> replicas = Lists.newArrayList(
> same comment about intermediate objects as below
Done


Line 208:         List<BlockReplica> replicas = Lists.newArrayListWithExpectedSize(
> hm, it would be nice not to have to construct these intermediate objects. p
Done


Line 224:     public THdfsCompression getFileCompression() {
> does this need to return a thrift struct?
Done


Line 231:     public FileBlock getFileBlock(int idx) {
> why not just return the fbfileblock and save yourself the object constructi
Done


Line 329:      * Loads the metadata of a file block from its block location metadata 'location'
and
> what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb
Done


Line 346:         replicaIdx = replica.isCached() ?
> please describe this encoding in the .fb file.
Done


Line 360:       FbFileBlock.addOffset(fbb, location.getOffset());
> why not just pass in the offset/length directly and avoid constructing a Bl
Done


Line 402:     public boolean hasCachedReplica() { return hasCachedReplica_; }
> instead of adding member vars, why not make all functions be static and tak
Done


Line 449:   public static class FileDescStats {
> since this only keeps track of a single number, consider using a Long inste
Done


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 134:   // if the table is stored in the catalog server.
> what does that mean? i thought all tables are stored in the catalog server.
The same class is used both in CatalogServiceCatalog (catalod) and ImpaladCatalog (impalad).
However, when the table is stored in the catalog we don't need to populate data structure
used for static partition pruning. I agree "stored in the catalog" is not the best description.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> is this part of your changes?
Yes, see previous comment in HdfsTable.java.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message