Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 2299 invoked from network); 2 Dec 2009 20:22:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Dec 2009 20:22:38 -0000 Received: (qmail 85514 invoked by uid 500); 2 Dec 2009 20:22:35 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 85428 invoked by uid 500); 2 Dec 2009 20:22:35 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 85415 invoked by uid 99); 2 Dec 2009 20:22:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2009 20:22:35 +0000 X-ASF-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vliaskov@gmail.com designates 209.85.160.50 as permitted sender) Received: from [209.85.160.50] (HELO mail-pw0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2009 20:22:33 +0000 Received: by pwi19 with SMTP id 19so446508pwi.29 for ; Wed, 02 Dec 2009 12:22:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=Xr3aWFX1qUXKygsodSPzrBzwW2zGetVS7NyFSTR2aBE=; b=vLZ6Plp6FAHE8FTOOy1WiJw+vwX//aqlyzU4U74JBLiH7SID/yVH/+ALxfXBpFkuXj tcRu1rQXB5MRZFcrNhM1fLUP4rznu4H3ZsrmuDU9tavzQaOREp/2ZUTU4kz54cC6/Q8T 7c/pyU3JOF7Z40VtrJQG+dTl4270ncDF9STwc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=D2G3veF4taqr1PnW9W4jlVv5Gj7nu9EZ+NNXkRCRDUmo+ltwibTEw1vCqavafA+JSK ovnNHGrPdVisHjGo5WS0fuaPpS9QXED5029XtAn061sX7AxsDGgP045uOxGfzCVikMY4 tOTn5oy3aic8R/iny6YxBb1bS+q34NLZLAjps= MIME-Version: 1.0 Received: by 10.142.1.37 with SMTP id 37mr60478wfa.287.1259785332978; Wed, 02 Dec 2009 12:22:12 -0800 (PST) Date: Wed, 2 Dec 2009 14:22:12 -0600 Message-ID: Subject: hadoop idle time on terasort From: Vasilis Liaskovitis To: common-user@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, I am using hadoop-0.20.1 to run terasort and randsort benchmarking tests on a small 8-node linux cluster. Most runs consist of usually low (<50%) core utilizations in the map and reduce phase, as well as heavy I/O phases . There is usually a large fraction of runtime for which cores are idling and i/o disk traffic is not heavy. On average for the duration of a terasort run I get 20-30% cpu utilization, 10-30% iowait times and the rest 40-70% is idle time. This is data collected with mpstat for the duration of the run across the cores of a specific node. This utilization behaviour is true and symmetric for all tasktracker/data nodes (The namenode cores and I/O are mostly idle, so there doesn=92t seem to be a bottleneck in the namenode). I am looking for an explanation for the significant idle-time in the runs. Could it have something to do with misconfigured network/RPC latency hadoop paremeters? For example, I have tried to increase mapred.heartbeats.in.second to 1000 from 100 but that didn=92t help. The network bandwidth (1Gige card on each node) is not saturated during the runs, according to my netstat results. Have other people noticed significant cpu idle times that can=92t be explained by I/O traffic? Is it reasonable to always expect decreasing idle times as the terasort dataset scales on the same cluster? I =91ve only tried 2 small datasets of 40GB and 64GB each, but core utilizations didn=92t increase with the runs done so far. Yahoo=92s paper on terasort (http://sortbenchmark.org/Yahoo2009.pdf) mentions several performance optimizations, some of which seem relevant to idle times. I am wondering which, if any, of the yahoo patches are part of the hadoop-0.20.1 distribution. Would it be a good idea to try a development version of hadoop to resolve this issue? thanks, - Vasilis