hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6308) make number of IPC accepts configurable
Date Fri, 05 Nov 2010 16:14:48 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928649#action_12928649

Hadoop QA commented on HADOOP-6308:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 1031422.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/68//console

This message is automatically generated.

> make number of IPC accepts configurable
> ---------------------------------------
>                 Key: HADOOP-6308
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6308
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.20.0
>         Environment: Linux, running Yahoo-based 0.20
>            Reporter: Andrew Ryan
>         Attachments: HADOOP-6308.patch
> We were recently seeing issues in our environments where HDFS clients would experience
RST's from the NN when trying to do RPC to get file info, which would cause the task to fatal
out. After some debugging we identified this to be that the IPC server listen queue -- ipc.server.listen.queue.size
-- was far too low, we had been using the default value of 128 and found we needed to bump
it up to 10240 before resets went away (although this value is a bit suspect, as I will explain
later in the issue).
> When a large map job starts, lots of clients very quickly start to issue RPC requests
to the namenode, which creates this listen queue filling up problem, because clients are opening
connections faster than Hadoop's RPC server can process them. We went back to our 0.17 cluster
and instrumented that with tcpdump and found that we had been sending RST's for a long time
there, but the retry handling was implemented differently back in 0.17 so a single TCP failure
wasn't task-fatal.
> In our environment we have our TCP stack set to explicitly send resets when the listen
queue gets overflowed (syctl net.ipv4.tcp_abort_on_overflow = 1), default linux behavior is
to start dropping SYN packets and let the client retransmit. Other people may be experiencing
this issue and not noticing it because they are using the default behavior, which is to let
the NN drop packets on the floor and let clients retransmit.
> So we've identified (at least) 3 improvements that can be made here:
> 1) In src/core/org/apache/hadoop/ipc/Server.java, Listener.doAccept() is currently hardcoded
to do 10 accept()'s at a time, then it will start to read. We feel that it would be better
to allow the server to be configured to support more than 10 accept's at one time using a
configurable parameter. We can still leave 10 as the default.
> 2) Increase the default value of ipc.server.listen.queue.size from 128, or at least document
that people with larger clusters starting thousands of mappers at once should increase this
value. I wonder if a lot of people running larger clusters are dropping packets and don't
realize it because TCP is covering them up. One one hand, yay TCP, on the other hand, those
are needless delays and retries because the server can handle more connections.
> 3) Document that ipc.server.listen.queue.size may be limited to the value of SOMAXCONN
(linux sysctl net.core.somaxconn ; default 4096 on our systems). The Java docs are not completely
clear about this, and it's difficult to test because you can't query the backlog of a listening
socket. We were under some time pressure in our case and tried 1024 which was not enough,
and 10240 which worked, so we stuck with that.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message