hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15696) KMS performance regression due to too many open file descriptors after Jetty migration
Date Tue, 28 Aug 2018 00:14:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594339#comment-16594339
] 

Misha Dmitriev commented on HADOOP-15696:
-----------------------------------------

To be precise, the suggested measure, that had such a big effect, was to adjust the timeout
in the {{HttpServer2.configureChannelConnector(ServerConnector c)}} method. Currently it has
the {{c.setIdleTimeout(10000);}} line. This timeout should be made configurable in the first
place, and looks like we need to adjust it to a (much) smaller value when {{HttpServer2}}
is used by KMS.

Here is a question that I have in this regard. If by closing HTTP connections on the server
side, and thus recycling more quickly, we make KMS work better - does it mean that the KMS
client doesn't reuse any such connections, and/or doesn't close a connection when it doesn't
need it anymore? If so, it doesn't sound very optimal. I wonder how to prove or disprove this
theory.

> KMS performance regression due to too many open file descriptors after Jetty migration
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15696
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15696
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: kms
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Blocker
>         Attachments: Screen Shot 2018-08-22 at 11.36.16 AM.png, Screen Shot 2018-08-22
at 4.26.51 PM.png, Screen Shot 2018-08-22 at 4.26.51 PM.png, Screen Shot 2018-08-22 at 4.27.02
PM.png, Screen Shot 2018-08-22 at 4.30.32 PM.png, Screen Shot 2018-08-22 at 4.30.39 PM.png,
Screen Shot 2018-08-24 at 7.08.16 PM.png
>
>
> We recently found KMS performance regressed in Hadoop 3.0, possibly linking to the migration
from Tomcat to Jetty in HADOOP-13597.
> Symptoms:
> # Hadoop 3.x KMS open file descriptors quickly rises to more than 10 thousand under stress,
sometimes even exceeds 32K, which is the system limit, causing failures for any access to
encryption zones. Our internal testing shows the openfd number was in the range of a few hundred
in Hadoop 2.x, and it increases by almost 100x in Hadoop 3.
> # Hadoop 3.x KMS as much as twice the heap size than in Hadoop 2.x. The same heap size
can go OOM in Hadoop 3.x. Jxray analysis suggests most of them are temporary byte arrays associated
with open SSL connections.
> # Due to the heap usage, Hadoop 3.x KMS has more frequent GC activities, and we observed
up to 20% performance reduction due to GC.
> A possible solution is to reduce the idle timeout setting in HttpServer2. It is currently
hard-coded 10 seconds. By setting it to 1 second, open fds dropped from 20 thousand down to
3 thousand in my experiment.
> File this jira to invite open discussion for a solution.
> Credit: [~misha@cloudera.com] for the proposed Jetty idle timeout remedy; [~xiaochen]
for digging into this problem.
> Screenshots:
> CDH5 (Hadoop 2) KMS CPU utilization, resident memory and file descriptor chart.
>  !Screen Shot 2018-08-22 at 4.30.39 PM.png! 
> CDH6 (Hadoop 3) KMS CPU utilization, resident memory and file descriptor chart.
>  !Screen Shot 2018-08-22 at 4.30.32 PM.png! 
> CDH5 (Hadoop 2) GC activities on the KMS process
>  !Screen Shot 2018-08-22 at 4.26.51 PM.png! 
> CDH6 (Hadoop 3) GC activities on the KMS process
>  !Screen Shot 2018-08-22 at 4.27.02 PM.png! 
> JXray report
>  !Screen Shot 2018-08-22 at 11.36.16 AM.png! 
> open fd drops from 20 k down to 3k after the proposed change.
>  !Screen Shot 2018-08-24 at 7.08.16 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message