impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Date Thu, 05 Oct 2017 02:37:37 GMT
Joe McDonnell has posted comments on this change. ( )

Change subject: IMPALA-4623: [DOCS] Document file handle caching

Patch Set 1:

File docs/topics/impala_known_issues.xml:
PS1, Line 338: continuously appended by an HDFS mechanism
This also applies if an HDFS file is overwritten in place.
File docs/topics/impala_scalability.xml:
PS1, Line 967: although the encryption layer
             :         adds overhead that might lessen the benefit of the caching.
I'm not familiar with this overhead. What is this referring to?
PS1, Line 973: 20 thousand
Just curious: How do you decide to use "20 thousand" vs "20,000"?
PS1, Line 991: evict any stale file handles from the cache
The file handles won't actually be evicted directly. The new metadata will mean that new statements
will no longer use that file handle and eventually it will get aged out. I'm not sure if this
distinction is important for documentation, but I think the important thing is that the memory
may not be freed immediately. (This is something we are likely to change in a future release.)
PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload,
issue the
             :         <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
or examine query
             :         profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph>
             :         (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph>
(ideally, should be low).
             :         Before starting any evaluation, run some representative queries to
<q>warm up</q> the cache,
             :         because the first time each data file is accessed is always recorded
as a cache miss.
I'm not sure this belongs here, but information about the cache across the whole impalad is
available via the metrics page under impala-server:

The total number of file handles in the cache is:

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message