Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49C0FE5B1 for ; Fri, 18 Jan 2013 14:13:07 +0000 (UTC) Received: (qmail 84123 invoked by uid 500); 18 Jan 2013 14:13:01 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 84027 invoked by uid 500); 18 Jan 2013 14:13:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 84018 invoked by uid 99); 18 Jan 2013 14:13:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 14:13:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.214.180] (HELO mail-ob0-f180.google.com) (209.85.214.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 14:12:54 +0000 Received: by mail-ob0-f180.google.com with SMTP id wd20so3664332obb.39 for ; Fri, 18 Jan 2013 06:12:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=58RUuI1C+4KiKZWsTwe3XoK/LAdreiB3A0xlxyKPz7Y=; b=fwkZLWGwgY7cNITm+3AIHwoNZTQ1U6Aosb3KlzqlZKl98Dtbh+zS1OtJQ3mqNT1qxl 020L+v/Ww9rlQgJpVQ4VJwXblGE8QdtL2exAaJGE2dGQQxvRf/zSEx2poA6WoMK3nsAS AKgOQJO7fTr24BKgMpfTtZxnUqppPofwonmlCeRLO0aagW5l8qSEpB8FLC0/R2FgitC+ h8yFqpFGVfLYw+8z7szC22TvpLlDZ8+jGHm1G1ZXczaJMCdpCGD/YMIWYi+zT0zx8IsV fjb/hfn4MuPS8vbHp0Dr1sklY87duUx6bYyQ21bPJNVqQ/+3wxHVOrwrLaqe0VXbsGKz cYbw== MIME-Version: 1.0 X-Received: by 10.182.145.4 with SMTP id sq4mr6952140obb.33.1358518353704; Fri, 18 Jan 2013 06:12:33 -0800 (PST) Received: by 10.182.105.130 with HTTP; Fri, 18 Jan 2013 06:12:33 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Jan 2013 09:12:33 -0500 Message-ID: Subject: Re: Estimating disk space requirements From: Jean-Marc Spaggiari To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQlARAANb4E2ylJaaJ2e0UgQqxkAz2zCdz/vL3qeea+v7i0K7oWQPD0euNUmnpV2BXdhhKU9 X-Virus-Checked: Checked by ClamAV on apache.org It all depend what you want to do with this data and the power of each single node. There is no one size fit all rule. The more nodes you have, the more CPU power you will have to process the data... But if you 80GB boxes CPUs are faster than your 40GB boxes CPU ,maybe you should take the 80GB then. If you want to get better advices from the list, you will need to beter define you needs and the nodes you can have. JM 2013/1/18, Panshul Whisper : > If we look at it with performance in mind, > is it better to have 20 Nodes with 40 GB HDD > or is it better to have 10 Nodes with 80 GB HDD? > > they are connected on a gigabit LAN > > Thnx > > > On Fri, Jan 18, 2013 at 2:26 PM, Jean-Marc Spaggiari < > jean-marc@spaggiari.org> wrote: > >> 20 nodes with 40 GB will do the work. >> >> After that you will have to consider performances based on your access >> pattern. But that's another story. >> >> JM >> >> 2013/1/18, Panshul Whisper : >> > Thank you for the replies, >> > >> > So I take it that I should have atleast 800 GB on total free space on >> > HDFS.. (combined free space of all the nodes connected to the cluster). >> So >> > I can connect 20 nodes having 40 GB of hdd on each node to my cluster. >> Will >> > this be enough for the storage? >> > Please confirm. >> > >> > Thanking You, >> > Regards, >> > Panshul. >> > >> > >> > On Fri, Jan 18, 2013 at 1:36 PM, Jean-Marc Spaggiari < >> > jean-marc@spaggiari.org> wrote: >> > >> >> Hi Panshul, >> >> >> >> If you have 20 GB with a replication factor set to 3, you have only >> >> 6.6GB available, not 11GB. You need to divide the total space by the >> >> replication factor. >> >> >> >> Also, if you store your JSon into HBase, you need to add the key size >> >> to it. If you key is 4 bytes, or 1024 bytes, it makes a difference. >> >> >> >> So roughly, 24 000 000 * 5 * 1024 = 114GB. You don't have the space to >> >> store it. Without including the key size. Even with a replication >> >> factor set to 5 you don't have the space. >> >> >> >> Now, you can add some compression, but even with a lucky factor of 50% >> >> you still don't have the space. You will need something like 90% >> >> compression factor to be able to store this data in your cluster. >> >> >> >> A 1T drive is now less than $100... So you might think about replacing >> >> you 20 GB drives by something bigger. >> >> to reply to your last question, for your data here, you will need AT >> >> LEAST 350GB overall storage. But that's a bare minimum. Don't go under >> >> 500GB. >> >> >> >> IMHO >> >> >> >> JM >> >> >> >> 2013/1/18, Panshul Whisper : >> >> > Hello, >> >> > >> >> > I was estimating how much disk space do I need for my cluster. >> >> > >> >> > I have 24 million JSON documents approx. 5kb each >> >> > the Json is to be stored into HBASE with some identifying data in >> >> coloumns >> >> > and I also want to store the Json for later retrieval based on the >> >> > Id >> >> data >> >> > as keys in Hbase. >> >> > I have my HDFS replication set to 3 >> >> > each node has Hadoop and hbase and Ubuntu installed on it.. so >> >> > approx >> >> > 11 >> >> GB >> >> > is available for use on my 20 GB node. >> >> > >> >> > I have no idea, if I have not enabled Hbase replication, is the HDFS >> >> > replication enough to keep the data safe and redundant. >> >> > How much total disk space I will need for the storage of the data. >> >> > >> >> > Please help me estimate this. >> >> > >> >> > Thank you so much. >> >> > >> >> > -- >> >> > Regards, >> >> > Ouch Whisper >> >> > 010101010101 >> >> > >> >> >> > >> > >> > >> > -- >> > Regards, >> > Ouch Whisper >> > 010101010101 >> > >> > > > > -- > Regards, > Ouch Whisper > 010101010101 >