Return-Path: Delivered-To: apmail-builds-archive@minotaur.apache.org Received: (qmail 67182 invoked from network); 23 Nov 2010 12:39:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 12:39:12 -0000 Received: (qmail 2407 invoked by uid 500); 23 Nov 2010 12:39:44 -0000 Delivered-To: apmail-builds-archive@apache.org Received: (qmail 2271 invoked by uid 500); 23 Nov 2010 12:39:43 -0000 Mailing-List: contact builds-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: builds@apache.org Delivered-To: mailing list builds@apache.org Received: (qmail 2263 invoked by uid 99); 23 Nov 2010 12:39:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 12:39:43 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [83.163.196.105] (HELO nyx.xs4all.nl) (83.163.196.105) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 12:39:37 +0000 Received: from redstone.qcg.lan ([192.168.99.6]) by nyx.xs4all.nl with esmtp (Exim 4.71) (envelope-from ) id 1PKs9T-0000OQ-IM for builds@apache.org; Tue, 23 Nov 2010 13:39:15 +0100 Message-ID: <4CEBB5F3.90508@qcg.nl> Date: Tue, 23 Nov 2010 13:39:15 +0100 From: Sim IJskes - QCG User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: builds@apache.org Subject: Re: [hudson] archive phase problems References: <4CEBB2CC.2030606@qcg.nl> In-Reply-To: <4CEBB2CC.2030606@qcg.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 11/23/2010 01:25 PM, Sim IJskes - QCG wrote: > Looks like a totally saturated disk subsystem (to me, from a distance). > Is there sar data from that machine? When you have a jobs that prints a timestamp just before the archive stage, you can correlate this with the sar data. You just have to look at the runq size or the waiting/blocked percentage to see if the I/O was the culprit. Is there idle=0% or very low, then its CPU bound. However, you have to find out how the work is divided between slave and master, maybe all the data is first sent to the master, and archived there. Gr Sim