impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Date Thu, 05 Oct 2017 02:37:37 GMT
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 )

Change subject: IMPALA-4623: [DOCS] Document file handle caching
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml@338
PS1, Line 338: continuously appended by an HDFS mechanism
This also applies if an HDFS file is overwritten in place.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967
PS1, Line 967: although the encryption layer
             :         adds overhead that might lessen the benefit of the caching.
I'm not familiar with this overhead. What is this referring to?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973
PS1, Line 973: 20 thousand
Just curious: How do you decide to use "20 thousand" vs "20,000"?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991
PS1, Line 991: evict any stale file handles from the cache
The file handles won't actually be evicted directly. The new metadata will mean that new statements
will no longer use that file handle and eventually it will get aged out. I'm not sure if this
distinction is important for documentation, but I think the important thing is that the memory
may not be freed immediately. (This is something we are likely to change in a future release.)


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995
PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload,
issue the
             :         <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
or examine query
             :         profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph>
             :         (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph>
(ideally, should be low).
             :         Before starting any evaluation, run some representative queries to
<q>warm up</q> the cache,
             :         because the first time each data file is accessed is always recorded
as a cache miss.
I'm not sure this belongs here, but information about the cache across the whole impalad is
available via the metrics page under impala-server:
impala-server.io.mgr.cached-file-handles-miss-count
impala-server.io.mgr.cached-file-handles-hit-count

The total number of file handles in the cache is:
impala-server.io.mgr.num-cached-file-handles



-- 
To view, visit http://gerrit.cloudera.org:8080/8200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message