Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ajit.ratnaparkhi@gmail.com
 designates 209.85.213.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+4kjVuNWMRRn_fQrLMWc1HzvDY+Ggt4p+_xp_HRyqBCUMZsoQ@mail.gmail.com>
References: <00bc01cd7585$4c585550$e508fff0$@com>
 <CA+4kjVuNWMRRn_fQrLMWc1HzvDY+Ggt4p+_xp_HRyqBCUMZsoQ@mail.gmail.com>
From: Ajit Ratnaparkhi <ajit.ratnaparkhi@gmail.com>
Date: Thu, 9 Aug 2012 00:01:37 +0530
Message-ID: 
 <CAO+BhY6M7RFmM8QUPXV5YqVNyaVWBe63ju-ZrU1R=8UbUDv_OQ@mail.gmail.com>
Subject: Re: is HDFS RAID "data locality" efficient?
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae934062b73ac8604c6c55370

--14dae934062b73ac8604c6c55370
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <stevel@hortonworks.com>wro=
te:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sluangsay@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3D3 for my
>> =93fresh data=94****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive da=
ta
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

--14dae934062b73ac8604c6c55370
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Agreed with Steve.<div>That is most important use of HDFS RAID, where you c=
onsume less disk space with same reliability and availability guarantee at =
cost of processing performance. Most of data in hdfs is cold data, without =
HDFS RAID you end up maintaining 3 replicas of data which is hardly going t=
o be processed again, but you cant remove/move this data to separate archiv=
e because if =A0required processing should be as soon as possible.</div>

<div><br></div><div>-Ajit<br><br><div class=3D"gmail_quote">On Wed, Aug 8, =
2012 at 11:01 PM, Steve Loughran <span dir=3D"ltr">&lt;<a href=3D"mailto:st=
evel@hortonworks.com" target=3D"_blank">stevel@hortonworks.com</a>&gt;</spa=
n> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br><br><div class=3D"gmail_quote"><div clas=
s=3D"im">On 8 August 2012 09:46, Sourygna Luangsay <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:sluangsay@pragsis.com" target=3D"_blank">sluangsay@pragsis.=
com</a>&gt;</span> wrote:<br>

</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">


<div lang=3D"ES" link=3D"blue" vlink=3D"purple">

<div>

<p class=3D"MsoNormal">Hi folks!<u></u><u></u></p>

<p class=3D"MsoNormal">One of the scenario I can think in order to
take advantage of HDFS RAID without suffering this penalty is:<u></u></p><d=
iv class=3D"im">

<p><u></u><span lang=3D"EN-US"><span>-<span style=3D"font:7.0pt &quot;Times=
 New Roman&quot;">=A0=A0=A0=A0=A0=A0=A0=A0=A0
</span></span></span><u></u><span lang=3D"EN-US">Using normal HDFS with def=
ault
replication=3D3 for my =93fresh data=94<u></u><u></u></span></p>

<p><u></u><span lang=3D"EN-US"><span>-<span style=3D"font:7.0pt &quot;Times=
 New Roman&quot;">=A0=A0=A0=A0=A0=A0=A0=A0=A0
</span></span></span><u></u><span lang=3D"EN-US">Using HDFS RAID for my
historical data (that is barely used by M/R)<u></u><u></u></span></p>

<p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u>=A0<u></u></span></p>

<p class=3D"MsoNormal"><br></p></div></div></div></blockquote></div><div><b=
r></div><div>exactly: less space use on cold data, with the penalty that ac=
cess performance can be worse. As the majority of data on a hadoop cluster =
is usually &quot;cold&quot;, it&#39;s a space and power efficient story for=
 the archive data</div>

<span class=3D"HOEnZb"><font color=3D"#888888">
<div><br></div>-- <br><div>Steve Loughran</div><div>Hortonworks Inc</div><d=
iv><br></div>
</font></span></blockquote></div><br></div>

--14dae934062b73ac8604c6c55370--