Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 780E4EAAB for ; Fri, 11 Jan 2013 05:50:55 +0000 (UTC) Received: (qmail 66081 invoked by uid 500); 11 Jan 2013 05:50:50 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 65849 invoked by uid 500); 11 Jan 2013 05:50:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 65827 invoked by uid 99); 11 Jan 2013 05:50:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:50:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of apivovarov@gmail.com designates 209.85.210.173 as permitted sender) Received: from [209.85.210.173] (HELO mail-ia0-f173.google.com) (209.85.210.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 05:50:37 +0000 Received: by mail-ia0-f173.google.com with SMTP id w21so1241970iac.4 for ; Thu, 10 Jan 2013 21:50:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=ga1NfllflVHtHVSMzuWYBjMDAFdN8GfLoEP20SU1QQE=; b=JWKOZM9MIYCBqf6WRZDvNevq8NNMGojTsmpnNFS+Am80ZBiXN/7JxI7MR+LHZk98TO 97dd6JLwmtAN6j/EL21l41KJ9V1MaU5YinImC5F/IM9PxSbL34q4shV8yvo7GjgeFWS4 R1rOjslq3Ig+1LJTGv7RrDXi/2BLMovWooYuW70B4OSN2vgtSQwn86MCRds/nV/GQ6hT o2tZJTcRuOUckczS569HUjhB5ReR3BJEgqkydiUAcu141DB3mfAVlnXyHGule3ctsl8m UFHBEDqdRoj1NCovCjwxh6zxWqW2k8vqD5nnJrdQ+jj1HIj5sICFbVn4D2X9/eBYzM63 qo7Q== X-Received: by 10.50.219.129 with SMTP id po1mr8021799igc.35.1357883416376; Thu, 10 Jan 2013 21:50:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.92.70 with HTTP; Thu, 10 Jan 2013 21:49:56 -0800 (PST) In-Reply-To: References: From: Alexander Pivovarov Date: Thu, 10 Jan 2013 21:49:56 -0800 Message-ID: Subject: Re: HDFS disk space requirement To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93410cbadf4d904d2fcde4c X-Virus-Checked: Checked by ClamAV on apache.org --14dae93410cbadf4d904d2fcde4c Content-Type: text/plain; charset=ISO-8859-1 finish elementary school first. (plus, minus operations at least) On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper wrote: > Thank you for the response. > > Actually it is not a single file, I have JSON files that amount to 115 GB, > these JSON files need to be processed and loaded into a Hbase data tables > on the same cluster for later processing. Not considering the disk space > required for the Hbase storage, If I reduce the replication to 3, how much > more HDFS space will I require? > > Thank you, > > > On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala wrote: > >> If the file is a txt file, you could get a good compression ratio. >> Changing the replication to 3 and the file will fit. But not sure what your >> usecase is what you want to achieve by putting this data there. Any >> transformation on this data and you would need more space to save the >> transformed data. >> >> If you have 5 nodes and they are not virtual machines, you should >> consider adding more harddisks to your cluster. >> >> >> On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper wrote: >> >>> Hello, >>> >>> I have a hadoop cluster of 5 nodes with a total of available HDFS space >>> 130 GB with replication set to 5. >>> I have a file of 115 GB, which needs to be copied to the HDFS and >>> processed. >>> Do I need to have anymore HDFS space for performing all processing >>> without running into any problems? or is this space sufficient? >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >>> >> >> > > > -- > Regards, > Ouch Whisper > 010101010101 > --14dae93410cbadf4d904d2fcde4c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
finish elementary school first. (plus, minus operations at= least)


= On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper <ouchwhisper@gmail.co= m> wrote:
Thank you for the response.=

Actually it is not a single file, I have JSON files tha= t amount to 115 GB, these JSON files need to be processed and loaded into a= Hbase data tables on the same cluster for later processing. Not considerin= g the disk space required for the Hbase storage, If I reduce the replicatio= n to 3, how much more HDFS space will I require?

Thank you,


On Fr= i, Jan 11, 2013 at 4:16 AM, Ravi Mutyala <ravi@hortonworks.com><= /span> wrote:
If the file is a txt file, = you could get a good compression ratio. Changing the replication to 3 and t= he file will fit. But not sure what your usecase is what you want to achiev= e by putting this data there. Any transformation on this data and you would= need more space to save the transformed data.=A0

If you have 5 nodes and they are not virtual machines, you s= hould consider adding more harddisks to your cluster.=A0


On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper <ouchwhisper@gmail.co= m> wrote:
Hello,

I= have a hadoop cluster of 5 nodes with a total of available HDFS space 130 = GB with replication set to 5.
I have a file of 115 GB, which needs to be copied to the HDFS and proc= essed.
Do I need to have anymore HDFS space for performing all processing wit= hout running into any problems? or is this space sufficient?

--
Regards,
Ouch Whisper
010101010101




--
=
Regards,
Ouch Whisper
010101010101

--14dae93410cbadf4d904d2fcde4c--