Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 57403 invoked from network); 29 Mar 2007 21:02:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Mar 2007 21:02:34 -0000 Received: (qmail 50127 invoked by uid 500); 29 Mar 2007 21:02:40 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 50095 invoked by uid 500); 29 Mar 2007 21:02:40 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 50086 invoked by uid 99); 29 Mar 2007 21:02:40 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 14:02:40 -0700 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=RCVD_NUMERIC_HELO X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [208.229.144.195] (HELO mail2.apgcanada.netvigour.com) (208.229.144.195) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2007 14:02:31 -0700 Received: from mail6.netvigour.com ([10.201.10.1]) by mail2.apgcanada.netvigour.com with Microsoft SMTPSVC(6.0.3790.0); Thu, 29 Mar 2007 17:01:13 -0400 Received: from 64.13.145.40 ([64.13.145.40]) by mail6.netvigour.com ([10.201.10.1]) via Exchange Front-End Server mail.netvigour.com ([10.201.10.8]) with Microsoft Exchange Server HTTP-DAV ; Thu, 29 Mar 2007 21:01:13 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Thu, 29 Mar 2007 14:01:42 -0700 Subject: Re: Scaling hadoop up From: Michael Bieniosek To: , Doug Cutting Message-ID: Thread-Topic: Scaling hadoop up Thread-Index: AcdyRXOIshFCJd44EduQeQAX8tKNOQ== In-Reply-To: <460C2398.5030005@apache.org> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 29 Mar 2007 21:01:13.0712 (UTC) FILETIME=[62AC5700:01C77245] X-Virus-Checked: Checked by ClamAV on apache.org I've seen this with 0.12.1. Currently I'm just running the jobtracker and namenode on one machine, with tasktrackers & datanodes on all the others (no secondarynamenode). It seems like it might help to put the jobtracker and namenode on different machines; is there anything else I could try? -Michael On 3/29/07 1:37 PM, "Doug Cutting" wrote: > Michael Bieniosek wrote: >> When I try to scale Hadoop up to about 100 nodes on EC2 (single-cpu Xen), I >> notice things start to fall apart. For example, the jobtracker starts >> dropping requests with the message "Call queue overflow discarding oldest >> call". I've also seen problems with the namenode where dfs requests fail >> with EOFExceptions. > > What version of Hadoop are you seeing this with? Scalability has been > improving. > >> I've tried increasing the heartbeat value for the dfs (it's not configurable >> for the jobtracker though). Is there some other trick to make hadoop scale >> a little further? The website claims that Hadoop has scaled to 600 nodes, >> but it seems like I would need a very powerful machine for the namenode and >> jobtracker to do this. Am I missing something? > > Yahoo! does use dual-processor nodes that are more powerful than EC2's > virtual nodes, but probably not 6x more powerful. > > Doug