hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajit Ratnaparkhi <ajit.ratnapar...@gmail.com>
Subject Re: is HDFS RAID "data locality" efficient?
Date Wed, 08 Aug 2012 18:31:37 GMT
Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <stevel@hortonworks.com>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sluangsay@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Mime
View raw message