Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 97602 invoked from network); 22 Jan 2009 10:18:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2009 10:18:56 -0000 Received: (qmail 95941 invoked by uid 500); 22 Jan 2009 10:18:50 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 95896 invoked by uid 500); 22 Jan 2009 10:18:50 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 95885 invoked by uid 99); 22 Jan 2009 10:18:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 02:18:50 -0800 X-ASF-Spam-Status: No, hits=-2.8 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [192.6.10.2] (HELO colossus.hpl.hp.com) (192.6.10.2) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2009 10:18:41 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by colossus.hpl.hp.com (Postfix) with ESMTP id 1661B6BBF7 for ; Thu, 22 Jan 2009 10:17:55 +0000 (GMT) X-Virus-Scanned: amavisd-new at hplb.hpl.hp.com Received: from colossus.hpl.hp.com ([127.0.0.1]) by localhost (colossus.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id sSiJNjVun1uB for ; Thu, 22 Jan 2009 10:17:39 +0000 (GMT) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by colossus.hpl.hp.com (Postfix) with ESMTPS id 61AB56BBF6 for ; Thu, 22 Jan 2009 10:17:13 +0000 (GMT) MailScanner-NULL-Check: 1233224222.57393@Fym7eC/yaOsUz7YnPzUeVQ Received: from [16.25.171.118] (morzine.hpl.hp.com [16.25.171.118]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id n0MAH0lT004008 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 22 Jan 2009 10:17:02 GMT Message-ID: <4978479C.805@apache.org> Date: Thu, 22 Jan 2009 10:17:00 +0000 From: Steve Loughran User-Agent: Thunderbird 2.0.0.19 (X11/20081209) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: Re: running hadoop on heterogeneous hardware References: <3b5f72030901211427q4b033446ofb248afa6d39a1c4@mail.gmail.com> In-Reply-To: <3b5f72030901211427q4b033446ofb248afa6d39a1c4@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: n0MAH0lT004008 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org X-Virus-Checked: Checked by ClamAV on apache.org Bill Au wrote: > Is hadoop designed to run on homogeneous hardware only, or does it work just > as well on heterogeneous hardware as well? If the datanodes have different > disk capacities, does HDFS still spread the data blocks equally amount all > the datanodes, or will the datanodes with high disk capacity end up storing > more data blocks? Similarily, if the tasktrackres have different numbers of > CPUs, is there a way to configure hadoop to run more tasks on those > tasktrackers that have more CPUs? Is that simply a matter of setting > mapred.tasktracker.map.tasks.maximum and > mapred.tasktracker.reduce.tasks.maximum differently on the tasktrackers? > > Bill > Life is simpler on homogenous boxes; by setting the maximum tasks differently for the different machines, you do limit the amount of work that gets pushed out to those boxes. More troublesome is slower CPUs/HDDs, they arent picked up directly, though speculative work can handle some of this One interesting bit of research would be something adaptive; something to monitor throughput and tune those values based on performance; that would detect variations in a cluster and work with with it, rather than requiring you to know the capabilities of every machine. -steve