hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
Date Wed, 09 Sep 2015 16:14:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737112#comment-14737112
] 

Jason Lowe commented on YARN-2410:
----------------------------------

Thanks for updating the patch.  I see now that since Shuffle is not a static class it's too
much trouble to try to factor out the overrides for the test.

ReduceContext should be a private class.

sendMap takes a reduce context, a channel context, and an info map, but the latter two are
already in the reduce context.  Seems like sendMap should just take a reduce context argument.

If sendMap returns null then I don't think we want messageReceived to blindly keep calling
sendMap in the loop.

Why were the override decorators removed from verifyRequest and getMapOutputInfo?  It's pretty
important that those actually override a method.

mockNetty is too monolothic like the old test and a bit unwieldy in that callers are expected
to start mocking and then let mockNetty finish the job.  The channel and message event mocking
is just a few simple lines for each and would be fine to stay in the main test method.  Utility
methods like createMockChannelFuture(channel, listenerList) and createMockHttpRequest() would
help keep the original test method manageable in length and factor out some of the more complicated
mocking of individual objects.


> Nodemanager ShuffleHandler can possible exhaust file descriptors
> ----------------------------------------------------------------
>
>                 Key: YARN-2410
>                 URL: https://issues.apache.org/jira/browse/YARN-2410
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: Nathan Roberts
>            Assignee: Kuhu Shukla
>         Attachments: YARN-2410-v1.patch, YARN-2410-v2.patch, YARN-2410-v3.patch, YARN-2410-v4.patch,
YARN-2410-v5.patch, YARN-2410-v6.patch, YARN-2410-v7.patch, YARN-2410-v8.patch
>
>
> The async nature of the shufflehandler can cause it to open a huge number of
> file descriptors, when it runs out it crashes.
> Scenario:
> Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node.
> Let's say all 6K reduces hit a node at about same time asking for their
> outputs. Each reducer will ask for all 40 map outputs over a single socket in a
> single request (not necessarily all 40 at once, but with coalescing it is
> likely to be a large number).
> sendMapOutput() will open the file for random reading and then perform an async transfer
of the particular portion of this file(). This will theoretically
> happen 6000*40=240000 times which will run the NM out of file descriptors and cause it
to crash.
> The algorithm should be refactored a little to not open the fds until they're
> actually needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message