impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4623: Thread level file handle caching
Date Tue, 28 Mar 2017 00:30:48 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Thread level file handle caching
......................................................................


Patch Set 1:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.cc
File be/src/runtime/disk-io-mgr.cc:

Line 70: DEFINE_uint64(max_cached_file_handles, 10000, "Maximum number of HDFS file handles
"
on further consideration, esp. in light of what tim brought up (separate remote io queue),
we should have separate cache capacities for cached reads, disk queues, and both remote io
queues.

let's retire this flag (it's safe to remove) and introduce new flags that are more descriptive
(s3_file_handle_cache_capacity?).


Line 388:   std::vector<int> num_threads_per_disk;
don't use std:: in .cc files


Line 1050: void DiskIoMgr::WorkLoop(DiskQueue* disk_queue, int thread_file_handle_cache_size)
{
you use 'fh' elsewhere for 'file handle', feel free to use that elsewhere in order to end
up with more manageable variable names.


Line 1052:   if (thread_file_handle_cache_size > 0) {
dcheck that it's not < 0


Line 1081:       Write(worker_context, static_cast<WriteRange*>(range));
what about writing?


Line 1337:   : capacity_(capacity) {}
move to .h


Line 1339: DiskIoMgr::ThreadFileHandleCache::~ThreadFileHandleCache() {}
remove


Line 1343:   std::string file_key = std::string(fname) + std::to_string(mtime);
is this guaranteed to be unique w/o some sort of separator?


Line 1351:       EvictFileHandle();
you only call this once, and it's a small function, you might as well inline it.


http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 233:   /// This is a single-threaded LRU cache for Hdfs file handles. The cache creates
and
> High-level observation: my understanding is that this changes the upper bou
each thread needs its own file handle in order to get accurate io stats.


Line 235:   class ThreadFileHandleCache {
'thread' is generic. LruFileHandleCache? or simply FileHandleCache?


Line 240:     /// Gets an Hdfs file handle for the specified file. It will lookup and use
a
"look up"


Line 245:     HdfsCachedFileHandle* GetFileHandle(const hdfsFS& fs, const char* fname,
int64_t mtime);
long line


Line 248: 
remove blank line


Line 256:     class LruListElement {
make it a struct, class member vars are supposed to be private.


Line 260:       ~LruListElement() {}
remove


Line 533:     /// in the reader context.
"reader context" = reader_ member. referencing the member directly is more explicit.


Line 544:     /// the DiskIoRequestContext. This clears the statistics on this file handle.
why not make that a function of diskiorequestcontext


Line 564:     int64_t remote_bytes_;
why here and not diskiorequestcontext?


Line 574:     /// at read time, so hdfs_file_ is null.
is there a need to have it both ways for hdfs files, instead of always opening at read time,
which would make the control flow easier to follow? it shouldn't matter for hdfs caching reads,
they get the entire scan range in one go anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message