hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graeme Seaton <li...@graemes.com>
Subject Re: HDFS and Openstack - avoiding excessive redundancy
Date Sat, 12 Nov 2011 07:54:05 GMT
One advantage to using Hadoop replication though, is that it provides a 
greater pool of potential servers for M/R jobs to execute on.  If you 
simply use Openstack replication it will appear to the JobTracker that a 
particular block only exists on a single server and should only be 
executed on that node.  This may have have an impact depending on your 
workload profile.


On 12/11/11 07:24, Dejan Menges wrote:
> Replication factor for HDFS can easily be changed to 1 if you don't need it's redundancy
in hdfs-site.xml
> Regards,
> Dejo
> Sent from my iPhone
> On 12. 11. 2011., at 03:58, Edmon Begoli<ebegoli@gmail.com>  wrote:
>> A question related to standing up cloud infrastructure for running Hadoop/HDFS.
>> We are building up an infrastructure using Openstack which has its own
>> storage management redundancy.
>> We are planning to use Openstack to instantiate Hadoop nodes (HDFS,
>> M/R tasks, Hive, HBase)
>> on demand.
>> The problem is that HDFS by design creates three copies of the data,
>> so there is a 4x times redundancy
>> which we would prefer to avoid.
>> I am asking here if anyone has had a similar case and if anyone has
>> had any helpful solution to recommend.
>> Thank you in advance,
>> Edmon

View raw message