Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 96723 invoked from network); 24 Feb 2010 23:09:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2010 23:09:51 -0000 Received: (qmail 44217 invoked by uid 500); 24 Feb 2010 23:09:50 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 44152 invoked by uid 500); 24 Feb 2010 23:09:50 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 44065 invoked by uid 99); 24 Feb 2010 23:09:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 23:09:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 23:09:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E6AFE234C48C for ; Wed, 24 Feb 2010 15:09:27 -0800 (PST) Message-ID: <1731185696.509371267052967943.JavaMail.jira@brutus.apache.org> Date: Wed, 24 Feb 2010 23:09:27 +0000 (UTC) From: "Konstantin Shvachko (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Updated: (HADOOP-1849) IPC server max queue size should be configurable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HADOOP-1849: ---------------------------------------- Status: Patch Available (was: Open) > IPC server max queue size should be configurable > ------------------------------------------------ > > Key: HADOOP-1849 > URL: https://issues.apache.org/jira/browse/HADOOP-1849 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc > Reporter: Raghu Angadi > Assignee: Konstantin Shvachko > Attachments: handlerQueueSizeConfig.patch, handlerQueueSizeConfig.patch, handlerQueueSizeConfig.patch > > > Currently max queue size for IPC server is set to (100 * handlers). Usually when RPC failures are observed (e.g. HADOOP-1763), we increase number of handlers and the problem goes away. I think a big part of such a fix is increase in max queue size. I think we should make maxQsize per handler configurable (with a bigger default than 100). There are other improvements also (HADOOP-1841). > Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the client. I have often heard from users that Hadoop doesn't handle bursty traffic. > Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite conservative/low for a typical server), it implies that an RPC can wait for only for 1 sec before it is dropped. If there 3000 clients and all of them send RPCs around the same time (not very rare, with heartbeats etc), 2000 will be dropped. In stead of dropping the earliest RPCs, if the server delays reading new RPCs, the feedback to clients would be much smoother, I will file another jira regd queue management. > For this jira I propose to make queue size per handler configurable, with a larger default (may be 500). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.