hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panshul Whisper <ouchwhis...@gmail.com>
Subject Re: HDFS disk space requirement
Date Fri, 11 Jan 2013 03:23:38 GMT
Thank you for the response.

Actually it is not a single file, I have JSON files that amount to 115 GB,
these JSON files need to be processed and loaded into a Hbase data tables
on the same cluster for later processing. Not considering the disk space
required for the Hbase storage, If I reduce the replication to 3, how much
more HDFS space will I require?

Thank you,

On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala <ravi@hortonworks.com> wrote:

> If the file is a txt file, you could get a good compression ratio.
> Changing the replication to 3 and the file will fit. But not sure what your
> usecase is what you want to achieve by putting this data there. Any
> transformation on this data and you would need more space to save the
> transformed data.
> If you have 5 nodes and they are not virtual machines, you should consider
> adding more harddisks to your cluster.
> On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper <ouchwhisper@gmail.com>wrote:
>> Hello,
>> I have a hadoop cluster of 5 nodes with a total of available HDFS space
>> 130 GB with replication set to 5.
>> I have a file of 115 GB, which needs to be copied to the HDFS and
>> processed.
>> Do I need to have anymore HDFS space for performing all processing
>> without running into any problems? or is this space sufficient?
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101

Ouch Whisper

View raw message