Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 20362 invoked from network); 2 May 2008 17:37:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 May 2008 17:37:52 -0000 Received: (qmail 19336 invoked by uid 500); 2 May 2008 17:37:48 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 19289 invoked by uid 500); 2 May 2008 17:37:48 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 19251 invoked by uid 99); 2 May 2008 17:37:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2008 10:37:48 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cagdas.gerede@gmail.com designates 209.85.146.179 as permitted sender) Received: from [209.85.146.179] (HELO wa-out-1112.google.com) (209.85.146.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2008 17:37:02 +0000 Received: by wa-out-1112.google.com with SMTP id m33so239362wag.9 for ; Fri, 02 May 2008 10:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to:to:subject:mime-version:content-type; bh=FdijzpA8Ln3epfC4PBm/LKwploaSDx/aKp6SVbsb75Y=; b=Bwg5Fxb3swryjvOsMI0p/pZZwuKQ3hDFNJ2fFMCMbGel+R4/nRNFrf6KocZ6sHvP5vr542rtvIvLDRM3QW2jiYZOV3SKbOPmZsdQL/UYRrJlhtSkfYVA697DOI4vLcHXecGxNxSfRtxAfHh/wUyQzzgyPaJg+TQ2GlrDjzmZ/YI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:to:subject:mime-version:content-type; b=pQcUSPceQ+GdAXwKT4K/QwTBfEC0PKse/ww4/HpPfeq59ScBrVfONFhTB3IIrgVQKwpo0HR4kNtKlxX5V7xYH9evZ/GO5dCj4YOKE1sMKxBbrVJ02C5iRdzJfHe0/gmXeYflw5w5IOndUbrHlGKK8C9RIOWZShEdQIbg1hovohc= Received: by 10.114.25.3 with SMTP id 3mr3269606way.22.1209749836625; Fri, 02 May 2008 10:37:16 -0700 (PDT) Received: by 10.114.59.10 with HTTP; Fri, 2 May 2008 10:37:16 -0700 (PDT) Message-ID: <4cc657e40805021037r353137dfv26b4d6f5512f6e1e@mail.gmail.com> Date: Fri, 2 May 2008 10:37:16 -0700 From: "Cagdas Gerede" Reply-To: cagdas.gerede@gmail.com To: core-user@hadoop.apache.org Subject: HDFS: Good practices for Number of Blocks per Datanode MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_13700_25334294.1209749836619" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_13700_25334294.1209749836619 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline As an addition to my question at the bottom, I was wondering what would be your suggestion in terms of how many blocks a datanode should be responsible for? For a system with 60 million blocks, we can have 3 datanodes with 20 million blocks each, or we can have 60 datanodes with 1 million blocks each. In either case, would there be performance implications or would they behave the same way? I guess in general what I would like to ask, as you need more and more storage, do you think we should add new datanodes in the system or we should add more harddisk space to existing datanodes? I appreciate your comments, Cagdas On Fri, May 2, 2008 at 10:25 AM, Cagdas Gerede wrote: > In the system I am working, we have 6 million blocks total and the > namenode heap size is about 600 MB and it takes about 5 minutes for namenode > to leave the safemode. > > I try to estimate what would be the heap size if we have 100 - 150 million > blocks, and what would be the amount of time for namenode to leave the > safemode. > From the extrapolation based on the numbers I have, I am calculating very > scary numbers for both (Terabytes for heap size) and half an hour or so > startup time. I am hoping that my extrapolation is not accurate. > > From your clusters, could you provide some numbers for number of files and > blocks in the system vs. the master heap size and master startup time. > > I really appreciate your help. > > Thanks. > Cagdas > > > -- > ------------ > Best Regards, Cagdas Evren Gerede > Home Page: http://cagdasgerede.info -- ------------ Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info ------=_Part_13700_25334294.1209749836619--