Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5821510450 for ; Mon, 4 Nov 2013 09:24:24 +0000 (UTC) Received: (qmail 42815 invoked by uid 500); 4 Nov 2013 09:24:23 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 42739 invoked by uid 500); 4 Nov 2013 09:24:20 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 42722 invoked by uid 99); 4 Nov 2013 09:24:18 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Nov 2013 09:24:18 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id A9B3D1D2B73; Mon, 4 Nov 2013 09:24:14 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2613565416525710268==" MIME-Version: 1.0 Subject: Re: Review Request 15080: CLOUDSTACK-4855: Throttle based on the # of outstanding requests to the directly managed HV host (direct agents) From: "ASF Subversion and Git Services" To: "Darren Shepherd" , "Chiradeep Vittal" , "Alex Huang" Cc: "Koushik Das" , "ASF Subversion and Git Services" , "cloudstack" Date: Mon, 04 Nov 2013 09:24:14 -0000 Message-ID: <20131104092414.30021.67784@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "ASF Subversion and Git Services" X-ReviewGroup: cloudstack X-ReviewRequest-URL: https://reviews.apache.org/r/15080/ X-Sender: "ASF Subversion and Git Services" References: <20131030105122.7402.39338@reviews.apache.org> In-Reply-To: <20131030105122.7402.39338@reviews.apache.org> Reply-To: "ASF Subversion and Git Services" --===============2613565416525710268== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15080/#review28091 ----------------------------------------------------------- Commit 269a4ef11ee151fa408a7dd1f2e69cd1f7f05191 in branch refs/heads/master from Koushik Das [ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=269a4ef ] CLOUDSTACK-4855: Throttle based on the # of outstanding requests to the directly managed HV host (direct agents) Cloudstack sends requests to directly managed HV hosts (direct agents) using the direct agent thread pool. The size of the pool is determined by global config direct.agent.pool.size defaulted to 500. Currently there is no restriction on the number of threads a direct agent can use from this shared thread pool to send requests to the host. This is fine as long as the host is responding to requests in a reasonable amount of time. But if there is a considerable delay in getting response, the thread remain blocked for that much time. As more commands are send to the slow host threads keep getting blocked. This can eventually lead to a situation where requests to healthy hosts cannot be processed as there are not enough free threads. The problem being addressed here is to localize the impact of few bad hosts, so that entire management server is not affected. One such way is to throttle based on the # of outstanding requests on per host basis. The outstanding requests to a host will be a % of direct agent pool size. This is configurable based on direct.agent.thread.cap. The default value is 0.1 or 10%, a value of 1 would mean the old behavior where there is no upper cap. This will ensure that the impacted host will be bound by a upper cap on the number of threads it can use to process requests and not the entire pool. - ASF Subversion and Git Services On Oct. 30, 2013, 10:51 a.m., Koushik Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15080/ > ----------------------------------------------------------- > > (Updated Oct. 30, 2013, 10:51 a.m.) > > > Review request for cloudstack, Alex Huang, Chiradeep Vittal, and Darren Shepherd. > > > Bugs: CLOUDSTACK-4855 > https://issues.apache.org/jira/browse/CLOUDSTACK-4855 > > > Repository: cloudstack-git > > > Description > ------- > > Cloudstack sends requests to directly managed HV hosts (direct agents) using the direct agent thread pool. The size of the pool is determined by global config direct.agent.pool.size defaulted to 500. > > Currently there is no restriction on the number of threads a direct agent can use from this shared thread pool to send requests to the host. This is fine as long as the host is responding to requests > in a reasonable amount of time. But if there is a considerable delay in getting response, the thread remain blocked for that much time. As more commands are send to the slow host threads keep getting > blocked. This can eventually lead to a situation where requests to healthy hosts cannot be processed as there are not enough free threads. > > The problem being addressed here is to localize the impact of few bad hosts, so that entire management server is not affected. > > One such way is to throttle based on the # of outstanding requests on per host basis. The outstanding requests to a host will be a % of direct agent pool size. This is configurable based on > direct.agent.thread.cap. This will ensure that the impacted host will be bound by a upper cap on the number of threads it can use to process requests and not the entire pool. > > > Note: The reason for checking the outstanding request count in the Task.run() method is to take into account cron jobs that gets scheduled at agent startup. > > > Diffs > ----- > > engine/orchestration/src/com/cloud/agent/manager/AgentAttache.java ff35255 > engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java 3e684cc > engine/orchestration/src/com/cloud/agent/manager/DirectAgentAttache.java 7d3f765 > > Diff: https://reviews.apache.org/r/15080/diff/ > > > Testing > ------- > > Verified by tweaking the per agent upper cap to a value of 1 and checked that the requests are getting scheduled but the executor thread simply bails out. > > > Thanks, > > Koushik Das > > --===============2613565416525710268==--