impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Russell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Date Thu, 05 Oct 2017 20:48:03 GMT
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 )

Change subject: IMPALA-4623: [DOCS] Document file handle caching
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967
PS1, Line 967: although the encryption layer
             :         adds overhead that might lessen the benefit of the caching.
> I'm not familiar with this overhead. What is this referring to?
I had written in the notes from our conversation HDFS encryption adds overhead". >From
when we were thinking about all the other complicating factors, like Sentry GRANT/REVOKE.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973
PS1, Line 973: 20 thousand
> Just curious: How do you decide to use "20 thousand" vs "20,000"?
For big numbers, I try to stick with either spelled-out forms or obvious powers of 2. (Like
I would say 65536 with no comma.) There are so many other separator conventions internationally
(https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html) I don't want to be too
US-centric.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991
PS1, Line 991: evict any stale file handles from the cache
> The file handles won't actually be evicted directly. The new metadata will 
Done


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995
PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload,
issue the
             :         <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
or examine query
             :         profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph>
             :         (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph>
(ideally, should be low).
             :         Before starting any evaluation, run some representative queries to
<q>warm up</q> the cache,
             :         because the first time each data file is accessed is always recorded
as a cache miss.
> I'm not sure this belongs here, but information about the cache across the 
Let's be inclusive for this first iteration and then fine-tune later if needed. We tend to
be skimpy with such information which is a weakness IMO.



-- 
To view, visit http://gerrit.cloudera.org:8080/8200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 20:48:03 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message