Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-issues@hadoop.apache.org
Date: Fri, 6 Dec 2013 19:55:41 +0000 (UTC)
From: "Chris Li (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.12652288.1370989636287.87721.1386359741436@arcas>
In-Reply-To: <JIRA.12652288.1370989636287@arcas>
References: <JIRA.12652288.1370989636287@arcas>
Subject: [jira] [Updated] (HADOOP-9640) RPC Congestion Control with
 FairCallQueue
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HADOOP-9640?page=3Dcom.atlassi=
an.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Li updated HADOOP-9640:
-----------------------------

    Attachment: faircallqueue2.patch

[~daryn] Definitely, this new patch is pluggable so that it defaults to the=
 LinkedBlockingQueue via FIFOCallQueue. We will also be testing performance=
 on larger clusters in January.

Please let me know your thoughts on this new patch.

In this new patch (faircallqueue2.patch):
*Architecture*
The FairCallQueue is responsible for its Scheduler and Mux, which in the fu=
ture will be pluggable as well. It is not made pluggable now since there is=
 only one option today.

Changes to NameNodeRPCServer (and others) are no longer necessary.

*Scheduling Token*
Using username right now, but will switch to jobID when a good way of inclu=
ding it is decided upon.

*Cross-server scheduling*
Scheduling across servers (for instance, the Namenode can have 2 RPC Server=
s for users and service calls) will be supported in a future patch.

*Configuration*
Configuration keys are keyed by port, so for a server running on 8020:

_ipc.8020.callqueue.impl_
Defaults to FIFOCallQueue.class, which uses a LinkedBlockingQueue. To enabl=
e priority, use "org.apache.hadoop.ipc.FairCallQueue"

_ipc.8020.faircallqueue.priority-levels_
Defaults to 4, controls the number of priority levels in the faircallqueue.

_ipc.8020.history-scheduler.service-users_
A comma separated list of users that will be exempt from scheduling and giv=
en top priority. Used for giving the service users (hadoop or hdfs) absolut=
e high priority. e.g. "hadoop,hdfs"

_ipc.8020.history-scheduler.history-length_
The number of past calls to remember. HistoryRpcScheduler will schedule req=
uests based on this pool. Defaults to 1000.

_ipc.8020.history-scheduler.thresholds_
A comma separated list of ints that specify the thresholds for scheduling i=
n the history scheduler. For instance with 4 queues and a history-length of=
 1000: "50,400,750" will schedule requests greater than 750 into queue 3, >=
 400 into queue 2, > 50 into queue 1, else into queue 0. Defaults to an eve=
n split (for a history-length of 200 and 4 queues it would be 50 each: "50,=
100,150")

_ipc.8020.wrr-multiplexer.weights_
A comma separated list of ints that specify weights for each queue. For ins=
tance with 4 queues: "10,5,5,1", which sets the handlers to draw from the q=
ueues with the following pattern:
* Read queue0 10 times
* Read queue1 5 times
* Read queue2 5 times
* Read queue3 1 time
And then repeat. Defaults to a log2 split: For 4 queues, it would be 8,4,2,=
1

> RPC Congestion Control with FairCallQueue
> -----------------------------------------
>
>                 Key: HADOOP-9640
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9640
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Xiaobo Peng
>              Labels: hdfs, qos, rpc
>         Attachments: MinorityMajorityPerformance.pdf, NN-denial-of-servic=
e-updated-plan.pdf, faircallqueue.patch, faircallqueue2.patch, rpc-congesti=
on-control-draft-plan.pdf
>
>
> Several production Hadoop cluster incidents occurred where the Namenode w=
as overloaded and failed to respond.=20
> We can improve quality of service for users during namenode peak loads by=
 replacing the FIFO call queue with a [Fair Call Queue|https://issues.apach=
e.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf=
]. (this plan supersedes rpc-congestion-control-draft-plan).
> Excerpted from the communication of one incident, =E2=80=9CThe map task o=
f a user was creating huge number of small files in the user directory. Due=
 to the heavy load on NN, the JT also was unable to communicate with NN...T=
he cluster became responsive only once the job was killed.=E2=80=9D
> Excerpted from the communication of another incident, =E2=80=9CNamenode w=
as overloaded by GetBlockLocation requests (Correction: should be getFileIn=
fo requests. the job had a bug that called getFileInfo for a nonexistent fi=
le in an endless loop). All other requests to namenode were also affected b=
y this and hence all jobs slowed down. Cluster almost came to a grinding ha=
lt=E2=80=A6Eventually killed jobtracker to kill all jobs that are running.=
=E2=80=9D
> Excerpted from HDFS-945, =E2=80=9CWe've seen defective applications cause=
 havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large =
directories (60k files) etc.=E2=80=9D


--
This message was sent by Atlassian JIRA
(v6.1#6144)