hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1849) IPC server max queue size should be configurable
Date Thu, 06 Sep 2007 20:33:28 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525519

Raghu Angadi commented on HADOOP-1849:

Server log for HADOOP-1763 would have been very useful for this. As far as I remember Dhruba
looked for "dropping because max q reached" messages for scalability improvements on Namenode.
When these messages went away that was a good indicator of improvement. With a large cluster
this is pretty easy to test.

Yes, memory should also be a concern, though increasing handler also has the same memory increase
plus memory for for each of the threads (may be 512k virtual memory for each thread). I datanode
blockReports is one example where each RPC take a lot of memory.

> IPC server max queue size should be configurable
> ------------------------------------------------
>                 Key: HADOOP-1849
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1849
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Raghu Angadi
>             Fix For: 0.15.0
> Currently max queue size for IPC server is set to (100 * handlers). Usually when RPC
failures are observed (e.g. HADOOP-1763), we increase number of handlers and the problem goes
away. I think a big part of such a fix is increase in max queue size. I think we should make
maxQsize per handler configurable (with a bigger default than 100). There are other improvements
also (HADOOP-1841).
> Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger
than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the
client. I have often heard from users that Hadoop doesn't handle bursty traffic.
> Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite conservative/low
for a typical server), it implies that an RPC can wait for only for 1 sec before it is dropped.
If there 3000 clients and all of them send RPCs around the same time (not very rare, with
heartbeats etc), 2000 will be dropped. In stead of dropping the earliest RPCs, if the server
delays reading new RPCs, the feedback to clients would be much smoother, I will file another
jira regd queue management.
> For this jira I propose to make queue size per handler configurable, with a larger default
(may be 500).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message