hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: is HDFS RAID "data locality" efficient?
Date Thu, 09 Aug 2012 10:34:24 GMT
Ok... 

So under Apache Hadoop, how do you specify the location of when and where a directory will
be created on HDFS? 

As an example, if I want to create a /coldData directory in HDFS as a place to store my older
data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?) 

I know I can do this in MapR's distribution, but I am not aware of this feature being made
available in the Apache based releases? 

Is this part of the latest feature set? 

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <stevel@hortonworks.com> wrote:

> 
> 
> On 8 August 2012 09:46, Sourygna Luangsay <sluangsay@pragsis.com> wrote:
> Hi folks!
> 
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering
this penalty is:
> 
> -          Using normal HDFS with default replication=3 for my “fresh data”
> 
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
> 
>  
> 
> 
> 
> 
> exactly: less space use on cold data, with the penalty that access performance can be
worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power
efficient story for the archive data
> 
> -- 
> Steve Loughran
> Hortonworks Inc
> 


Mime
View raw message