Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 87571 invoked from network); 11 Nov 2010 19:52:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Nov 2010 19:52:15 -0000 Received: (qmail 40063 invoked by uid 500); 11 Nov 2010 19:52:43 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 40026 invoked by uid 500); 11 Nov 2010 19:52:43 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 40018 invoked by uid 99); 11 Nov 2010 19:52:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 19:52:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of zhengda1936@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 19:52:34 +0000 Received: by qyk32 with SMTP id 32so1802096qyk.14 for ; Thu, 11 Nov 2010 11:52:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=JuSVhMg+ohpMPriI0VN2uEE1fWkyQbOc5VivnO69EU0=; b=SaoiCNc6UwgPc9/ktcaA53BIOMKkNXmgo1oePH3D2Kmx/06VPtbhfZmOSB7mlt74qY RynjDxf4a0dXsIx3GhBePQlSilV0qWXXXmqaUrcTTwQHt3NMIhj/8UIIM2KpbFlZRXCN k/FKCINQSUMFaw5apzThpJ8cgOBh6NQBy11Q8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=OOzQR/ZjFscUlj5N1EoEeBCOqA6Y7oF9PmJG9IKFw2K33tQT2eDyQr/wjOwDyvgG/e kV3uu3h2j5L6MlF5EIoBBYnZzKN949bovvcU3nogTxvkhzM+DoganTuccLDHg+daQt09 e+tWKWjcDJGysfYN2QRwUGX3QJ0PxO9qsHaiM= Received: by 10.224.191.194 with SMTP id dn2mr1388847qab.263.1289505133519; Thu, 11 Nov 2010 11:52:13 -0800 (PST) Received: from [128.220.68.91] (zdpc.cs.jhu.edu [128.220.68.91]) by mx.google.com with ESMTPS id x9sm2216987qco.22.2010.11.11.11.52.12 (version=SSLv3 cipher=RC4-MD5); Thu, 11 Nov 2010 11:52:12 -0800 (PST) Message-ID: <4CDC4977.4000608@gmail.com> Date: Thu, 11 Nov 2010 14:52:23 -0500 From: Da Zheng User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: monitor the hadoop cluster Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, I wrote a MapReduce program and ran it on a 3-node hadoop cluster, but its running time varies a lot, from 2 minutes to 3 minutes. I want to understand how time is used by the map phase and the reduce phase, and hope to find the place to improve the performance. Also the current input data is sorted, so I wrote a customized partitioner to reduce the data shuffling across the network. I need some means to help me observe the data movement. I know hadoop community developed chukwa for monitoring, but it seems very immature right now. I wonder how people monitor hadoop cluster right now. Is there a good way to solve my problems listed above? Thanks, Da