hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6967) DNs may OOM under high webhdfs load
Date Mon, 08 Sep 2014 18:08:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125836#comment-14125836
] 

Daryn Sharp commented on HDFS-6967:
-----------------------------------

[~cnauroth], I agree jetty should be upgraded but I'd rather not delay this important fix
while the issue is hashed out.  My understanding is it's not a trivial upgrade and the change
in dependencies may be disruptive to downstream projects.

I don't think only limiting the number of jetty connections is good enough, although it would
help.  DFSClients hop between nodes to retrieve blocks and the streamers are lightweight compared
to jetty connection objects.  A webhdfs remote read creates a DFSClient on a node containing
a replica of the first block it will read.  This creates hotspots on a few nodes with the
first block, which is esp. bad for wide jobs that will access multi-block files, ex. a map
side join on a big file.

Much as yarn doesn't guarantee node-local, why should webhdfs?

> DNs may OOM under high webhdfs load
> -----------------------------------
>
>                 Key: HDFS-6967
>                 URL: https://issues.apache.org/jira/browse/HDFS-6967
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, webhdfs
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Eric Payne
>
> Webhdfs uses jetty.  The size of the request thread pool is limited, but jetty will accept
and queue infinite connections.  Every queued connection is "heavy" with buffers, etc.  Unlike
data streamer connections, thousands of webhdfs connections will quickly OOM a DN.  The accepted
requests must be bounded and excess clients rejected so they retry on a new DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message