impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4623: Thread level file handle caching
Date Mon, 27 Mar 2017 18:56:34 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4623: Thread level file handle caching
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6478/1/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 233:   /// This is a single-threaded LRU cache for Hdfs file handles. The cache creates
and
High-level observation: my understanding is that this changes the upper bound on # of open
handles per file from (# of open scan ranges) to (# of I/O threads that read from that file)

Sometimes the second number can be higher if there are many threads servicing a queue (e.g.
S3, remote reads, a local SSD, a non-standard config.

E.g. suppose you create a single scan range for reading a text file from S3, and it gets scheduled
at some point onto each of the S3 I/O threads, then we might end up with 16 open files rather
than 1.

If we had a shared cache between all the threads servicing a disk queue, then I think we'd
get min(# of open scan ranges, # of parallel reads from the file), which is a strict improvement
over the current approach.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message