impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Date Mon, 20 Mar 2017 17:22:07 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata

Patch Set 1:

Commit Message:

Line 11: serialization library.
> Any benchmark numbers? Just curious. Also I think it is good to add them he
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
> Trailing white spaces at multiple places in this file.

Line 45:     set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
> Can this be moved outside the loop like JAVA_FE_ARGS
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
> extraneous white spaces.

PS1, Line 54: file_name_prefix_idx
> Didn't understand what this prefix means till I read HdfsTable.fileNamePref

Line 70: root_type FbFileDesc;
> What is the significance of this? I'm reading about FlatBuffers for the fir
Yeah, it is mostly used if you read from JSON. Removed.
File fe/src/main/java/org/apache/impala/catalog/

Line 83:         storageIdGenerator.put(host, new Short(diskId));
> Drive-by comment: using 'new' here is pessimising memory consumption. If yo
Good point thanks. Done
File fe/src/main/java/org/apache/impala/catalog/

> Looks like it used for UNMASKing, rename may be?
It's a bit mask, so I'd rather keep the name as such :)

Line 318:     private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
> Document these?

PS1, Line 357: getHostIdx
> how about just making hostIdx_ a short rather than typecasting everytime?

> Shouldn't this be short rather than int? May be it works fine this way, jus
For convenience. Java short is signed.
File fe/src/main/java/org/apache/impala/catalog/

Line 315:         Preconditions.checkNotNull(fd);
> I don't think this FileDescriptor.create() can ever return null? If yes, wo

Line 338:    * Computes the common prefix between 'fileName' and 'prevFileName' and returns
> I like the idea but I'm wondering if this kind of optimization is a little 
That's not entirely true. Multiple filenames from a single write do have a common prefix.

Line 420:           prefixCompressFileName(currentFileName, prevFileName);
> Also I think it may not be optimal to construct the prefixes this way by co
In principal you're right, this is not the best way to compress strings. However, due to the
way these file names are generated (random) this is not meant to be a generic file name compressor.
Here we take advantage of the pattern that file names of a single write exhibit to avoid storing
the common prefix multiple times.
File fe/src/main/java/org/apache/impala/catalog/

Line 96:   protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = EnumSet.of(
> May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by
Not sure how this ended up here. Removed.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Bharath Vissapragada <>
Gerrit-Reviewer: Dimitris Tsirogiannis <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message