Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of apivovarov@gmail.com designates
 209.85.210.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+NDPeeHQT5qrRh0dsRQa4EbVrWcRjznSAU7UFuSjXp9Rkm1Hw@mail.gmail.com>
References: 
 <CA+NDPeebkRFvWPqH-8VKH1y1DoaO0W6R1x=B+2v7tUAEUJNOxw@mail.gmail.com>
 <CAAs+fC7jccxKy87QOOqH6xJYXpWZ8KtES+aZg5nxLdcvqy7gxw@mail.gmail.com>
 <CA+NDPeeHQT5qrRh0dsRQa4EbVrWcRjznSAU7UFuSjXp9Rkm1Hw@mail.gmail.com>
From: Alexander Pivovarov <apivovarov@gmail.com>
Date: Thu, 10 Jan 2013 21:49:56 -0800
Message-ID: 
 <CAKKt98SYMsnFGk8ruHfXb3jkLPobS4SrqFwg4=PwraB8OT1-rA@mail.gmail.com>
Subject: Re: HDFS disk space requirement
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae93410cbadf4d904d2fcde4c

--14dae93410cbadf4d904d2fcde4c
Content-Type: text/plain; charset=ISO-8859-1

finish elementary school first. (plus, minus operations at least)


On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper <ouchwhisper@gmail.com>wrote:

> Thank you for the response.
>
> Actually it is not a single file, I have JSON files that amount to 115 GB,
> these JSON files need to be processed and loaded into a Hbase data tables
> on the same cluster for later processing. Not considering the disk space
> required for the Hbase storage, If I reduce the replication to 3, how much
> more HDFS space will I require?
>
> Thank you,
>
>
> On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala <ravi@hortonworks.com>wrote:
>
>> If the file is a txt file, you could get a good compression ratio.
>> Changing the replication to 3 and the file will fit. But not sure what your
>> usecase is what you want to achieve by putting this data there. Any
>> transformation on this data and you would need more space to save the
>> transformed data.
>>
>> If you have 5 nodes and they are not virtual machines, you should
>> consider adding more harddisks to your cluster.
>>
>>
>> On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper <ouchwhisper@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a hadoop cluster of 5 nodes with a total of available HDFS space
>>> 130 GB with replication set to 5.
>>> I have a file of 115 GB, which needs to be copied to the HDFS and
>>> processed.
>>> Do I need to have anymore HDFS space for performing all processing
>>> without running into any problems? or is this space sufficient?
>>>
>>> --
>>> Regards,
>>> Ouch Whisper
>>> 010101010101
>>>
>>
>>
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

--14dae93410cbadf4d904d2fcde4c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">finish elementary school first. (plus, minus operations at=
 least)</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">=
On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:ouchwhisper@gmail.com" target=3D"_blank">ouchwhisper@gmail.co=
m</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thank you for the response.=
<div><br></div><div>Actually it is not a single file, I have JSON files tha=
t amount to 115 GB, these JSON files need to be processed and loaded into a=
 Hbase data tables on the same cluster for later processing. Not considerin=
g the disk space required for the Hbase storage, If I reduce the replicatio=
n to 3, how much more HDFS space will I require?</div>


<div><br></div><div>Thank you,</div></div><div class=3D"HOEnZb"><div class=
=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fr=
i, Jan 11, 2013 at 4:16 AM, Ravi Mutyala <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:ravi@hortonworks.com" target=3D"_blank">ravi@hortonworks.com</a>&gt;<=
/span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">If the file is a txt file, =
you could get a good compression ratio. Changing the replication to 3 and t=
he file will fit. But not sure what your usecase is what you want to achiev=
e by putting this data there. Any transformation on this data and you would=
 need more space to save the transformed data.=A0<div>


<br></div><div>If you have 5 nodes and they are not virtual machines, you s=
hould consider adding more harddisks to your cluster.=A0</div></div><div><d=
iv><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">
On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:ouchwhisper@gmail.com" target=3D"_blank">ouchwhisper@gmail.co=
m</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hello,<div><br></div><div>I=
 have a hadoop cluster of 5 nodes with a total of available HDFS space 130 =
GB with replication set to 5.</div>


<div>I have a file of 115 GB, which needs to be copied to the HDFS and proc=
essed.</div>
<div>Do I need to have anymore HDFS space for performing all processing wit=
hout running into any problems? or is this space sufficient?<span><font col=
or=3D"#888888"><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr">

<div>Regards,</div>Ouch Whisper<div>
010101010101</div></div>
</font></span></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div dir=3D"ltr"><div>Regards,</div>Ouch Whisper<div>010101010101</div></di=
v>
</div>
</div></div></blockquote></div><br></div>

--14dae93410cbadf4d904d2fcde4c--