Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D761710CF1 for ; Fri, 6 Dec 2013 19:55:41 +0000 (UTC) Received: (qmail 79549 invoked by uid 500); 6 Dec 2013 19:55:41 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 79515 invoked by uid 500); 6 Dec 2013 19:55:41 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 79506 invoked by uid 99); 6 Dec 2013 19:55:41 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Dec 2013 19:55:41 +0000 Date: Fri, 6 Dec 2013 19:55:41 +0000 (UTC) From: "Chris Li (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-9640) RPC Congestion Control with FairCallQueue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9640?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Li updated HADOOP-9640: ----------------------------- Attachment: faircallqueue2.patch [~daryn] Definitely, this new patch is pluggable so that it defaults to the= LinkedBlockingQueue via FIFOCallQueue. We will also be testing performance= on larger clusters in January. Please let me know your thoughts on this new patch. In this new patch (faircallqueue2.patch): *Architecture* The FairCallQueue is responsible for its Scheduler and Mux, which in the fu= ture will be pluggable as well. It is not made pluggable now since there is= only one option today. Changes to NameNodeRPCServer (and others) are no longer necessary. *Scheduling Token* Using username right now, but will switch to jobID when a good way of inclu= ding it is decided upon. *Cross-server scheduling* Scheduling across servers (for instance, the Namenode can have 2 RPC Server= s for users and service calls) will be supported in a future patch. *Configuration* Configuration keys are keyed by port, so for a server running on 8020: _ipc.8020.callqueue.impl_ Defaults to FIFOCallQueue.class, which uses a LinkedBlockingQueue. To enabl= e priority, use "org.apache.hadoop.ipc.FairCallQueue" _ipc.8020.faircallqueue.priority-levels_ Defaults to 4, controls the number of priority levels in the faircallqueue. _ipc.8020.history-scheduler.service-users_ A comma separated list of users that will be exempt from scheduling and giv= en top priority. Used for giving the service users (hadoop or hdfs) absolut= e high priority. e.g. "hadoop,hdfs" _ipc.8020.history-scheduler.history-length_ The number of past calls to remember. HistoryRpcScheduler will schedule req= uests based on this pool. Defaults to 1000. _ipc.8020.history-scheduler.thresholds_ A comma separated list of ints that specify the thresholds for scheduling i= n the history scheduler. For instance with 4 queues and a history-length of= 1000: "50,400,750" will schedule requests greater than 750 into queue 3, >= 400 into queue 2, > 50 into queue 1, else into queue 0. Defaults to an eve= n split (for a history-length of 200 and 4 queues it would be 50 each: "50,= 100,150") _ipc.8020.wrr-multiplexer.weights_ A comma separated list of ints that specify weights for each queue. For ins= tance with 4 queues: "10,5,5,1", which sets the handlers to draw from the q= ueues with the following pattern: * Read queue0 10 times * Read queue1 5 times * Read queue2 5 times * Read queue3 1 time And then repeat. Defaults to a log2 split: For 4 queues, it would be 8,4,2,= 1 > RPC Congestion Control with FairCallQueue > ----------------------------------------- > > Key: HADOOP-9640 > URL: https://issues.apache.org/jira/browse/HADOOP-9640 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 3.0.0, 2.2.0 > Reporter: Xiaobo Peng > Labels: hdfs, qos, rpc > Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-servic= e-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, rpc-congesti= on-control-draft-plan.pdf > > > Several production Hadoop cluster incidents occurred where the Namenode w= as overloaded and failed to respond.=20 > We can improve quality of service for users during namenode peak loads by= replacing the FIFO call queue with a [Fair Call Queue|https://issues.apach= e.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf= ]. (this plan supersedes rpc-congestion-control-draft-plan). > Excerpted from the communication of one incident, =E2=80=9CThe map task o= f a user was creating huge number of small files in the user directory. Due= to the heavy load on NN, the JT also was unable to communicate with NN...T= he cluster became responsive only once the job was killed.=E2=80=9D > Excerpted from the communication of another incident, =E2=80=9CNamenode w= as overloaded by GetBlockLocation requests (Correction: should be getFileIn= fo requests. the job had a bug that called getFileInfo for a nonexistent fi= le in an endless loop). All other requests to namenode were also affected b= y this and hence all jobs slowed down. Cluster almost came to a grinding ha= lt=E2=80=A6Eventually killed jobtracker to kill all jobs that are running.= =E2=80=9D > Excerpted from HDFS-945, =E2=80=9CWe've seen defective applications cause= havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large = directories (60k files) etc.=E2=80=9D -- This message was sent by Atlassian JIRA (v6.1#6144)