impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Date Sat, 06 May 2017 19:11:29 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache

Patch Set 4:


looking good.
File be/src/exec/

Line 894:         runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_));
pretty awkward. why not call the reader directly (and maybe introduce a new getter)?
File be/src/exec/hdfs-scan-node-base.h:

Line 441:   RuntimeProfile::Counter* cached_file_handles_hit_count_;
initialize to nullptr here, that way you don't need to do it in the c'tor (which you have
to edit if the member order changes, or which you might have to duplicate if you multiple

do that for all pointer members.
File be/src/runtime/buffered-block-mgr.h:

Line 25: #include <boost/unordered_map.hpp>
File be/src/runtime/

Line 342:       io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_);
yes, the name really needs to distinguish between cache operations and direct operations on

Line 548:         remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead;
why not stick that into reader_ as well?
File be/src/runtime/disk-io-mgr.h:

Line 231:   /// Threads checkout a file handle for exclusive access and return it when finished.
"check out" (checkout is a noun)

Line 237:   template <size_t NUM_PARTITIONS>
please pull this along with hdfsfilehandle out into a separate file, this file is large enough
as it is.

Line 242:     /// first k partitions will each have one additional file handle.
let's make the size uniform across all partitions (e.g., by rounding up), that should simplify
the code.

Line 282:     /// the partitions are aligned to cache line boundaries (64 bytes).
hm, isn't there some predefined constant somewhere for the cache line size?

Line 549:     Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT;
does it ever make sense to call open on the same scan range with different values of the bool?
if not, make a parameter of the c'tor?

point out clearly in the comment that if the param is true, open does nothing.

Line 573:     void* meta_data_;
initialize nullptr here (for all pointer members)

Line 587:     /// Total number of bytes read remotely
point out why we need to maintain this particular metric instead of in requestcontext.

Line 601:     /// handle, and no sharing takes place.
also mention when this is populated and how long it's valid.

Line 856:   HdfsFileHandle* OpenHdfsFile(const hdfsFS& fs,
if this always returns a cached handle, would be good to reflect that in the name.

Line 860:   void CacheOrCloseFileHandle(const char* fname, HdfsFileHandle* fid);
File tests/query_test/

Line 76:     # File deleted: either # rows is the same or # rows = 0
we don't make any guarantees for this, so might as well not check the result. we just don't
want to crash.

Line 98:     new_table_location = "{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database)
long lines

Line 105:     # Delete the whole directory (including data file)
this function is pretty much identical to the one above (and below), except for the specific
modification action. consolidate into a single function and drive that with an enum parameter.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message