Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CANXtaKCQS9wPiCVHDV_Ryut6oA3cHejrwq4HM=qTMsNuaJxkQw@mail.gmail.com>
References: <CAGiZwvPtzYrT0GQuWvnN6oJ4eLVEE-T5QO3Vkw+pwCWOz+c2DA@mail.gmail.com>
 <CANXtaKCQS9wPiCVHDV_Ryut6oA3cHejrwq4HM=qTMsNuaJxkQw@mail.gmail.com>
From: Peyman Mohajerian <mohajeri@gmail.com>
Date: Sun, 5 Jun 2016 12:14:04 -0700
Message-ID: <CAO6Jcpi5wOKj2Z=ceR8bSReK10g-9NpNnujhQ7T5139b4Zw9MA@mail.gmail.com>
Subject: Re: HDFS2 vs MaprFS
To: Marcin Tustin <mtustin@handybook.com>
Cc: Ascot Moss <ascot.moss@gmail.com>, user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a114f0aaa67c11205348cc290
archived-at: Sun, 05 Jun 2016 19:14:20 -0000

--001a114f0aaa67c11205348cc290
Content-Type: text/plain; charset=UTF-8

It is very common practice to backup the metadata in some SAN store. So the
idea of complete loss of all the metadata is preventable. You could lose a
day worth of data if e.g. you back the metadata once a day but you could do
it more frequently. I'm not saying S3 or Azure Blob are bad ideas.

On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <mtustin@handybook.com> wrote:

> The namenode architecture is a source of fragility in HDFS. While a high
> availability deployment (with two namenodes, and a failover mechanism)
> means you're unlikely to see service interruption, it is still possible to
> have a complete loss of filesystem metadata with the loss of two machines.
>
> Secondly, because HDFS identifies datanodes by their hostname/ip, dns
> changes can cause havoc with HDFS (see my war story on this here:
> https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab
> ).
>
> Also, the namenode/datanode architecture probably does contribute to the
> small files problem being a problem. That said, there are lot of practical
> solutions for the small files problem.
>
> If you're just setting up a data infrastructure, I would say consider
> alternatives before you pick HDFS. If you run in AWS, S3 is a good
> alternative. If you run in some other cloud, it's probably worth
> considering whatever their equivalent storage system is.
>
>
> On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <ascot.moss@gmail.com> wrote:
>
>> Hi,
>>
>> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
>>
>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>
>> It states that HDFS Federation has
>>
>> a) "Multiple Single Points of Failure", is it really true?
>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to
>> an unfair comparison (or even misleading comparison)?  (HDFS was from
>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there
>> is no any Single Points of  Failure in HDFS2.
>>
>> b) "Limit to 50-200 million files", is it really true?
>> I have seen so many real world Hadoop Clusters with over 10PB data, some
>> even with 150PB data.  If "Limit to 50 -200 millions files" were true in
>> HDFS2, why are there so many production Hadoop clusters in real world? how
>> can they mange well the issue of  "Limit to 50-200 million files"? For
>> instances,  the Facebook's "Like" implementation runs on HBase at Web
>> Scale, I can image HBase generates huge number of files in Facbook's Hadoop
>> cluster, the number of files in Facebook's Hadoop cluster should be much
>> much bigger than 50-200 million.
>>
>> From my point of view, in contrast, MaprFS should have true limitation up
>> to 1T files while HDFS2 can handle true unlimited files, please do correct
>> me if I am wrong.
>>
>> c) "Performance Bottleneck", again, is it really true?
>> MaprFS does not have namenode in order to gain file system performance.
>> If without Namenode, MaprFS would lose Data Locality which is one of the
>> beauties of Hadoop  If Data Locality is no longer available, any big data
>> application running on MaprFS might gain some file system performance but
>> it would totally lose the true gain of performance from Data Locality
>> provided by Hadoop's namenode (gain small lose big)
>>
>> d) "Commercial NAS required"
>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
>> Federation?
>>
>> regards
>>
>>
>>
>>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>

--001a114f0aaa67c11205348cc290
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It is very common practice to backup the metadata in some =
SAN store. So the idea of complete loss of all the metadata is preventable.=
 You could lose a day worth of data if e.g. you back the metadata once a da=
y but you could do it more frequently. I&#39;m not saying S3 or Azure Blob =
are bad ideas.</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quot=
e">On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:mtustin@handybook.com" target=3D"_blank">mtustin@handybook.co=
m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">=
The namenode architecture is a source of fragility in HDFS. While a high av=
ailability deployment (with two namenodes, and a failover mechanism) means =
you&#39;re unlikely to see service interruption, it is still possible to ha=
ve a complete loss of filesystem metadata with the loss of two machines.<di=
v><br></div><div>Secondly, because HDFS identifies datanodes by their hostn=
ame/ip, dns changes can cause havoc with HDFS (see my war story on this her=
e:=C2=A0<a href=3D"https://medium.com/handy-tech/renaming-hdfs-datanodes-co=
nsidered-terribly-harmful-2bc2f37aabab" target=3D"_blank">https://medium.co=
m/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aab=
ab</a>).</div><div><br></div><div>Also, the namenode/datanode architecture =
probably does contribute to the small files problem being a problem. That s=
aid, there are lot of practical solutions for the small files problem.=C2=
=A0</div><div><br></div><div>If you&#39;re just setting up a data infrastru=
cture, I would say consider alternatives before you pick HDFS. If you run i=
n AWS, S3 is a good alternative. If you run in some other cloud, it&#39;s p=
robably worth considering whatever their equivalent storage system is.</div=
><div><br></div></div><div><div class=3D"h5"><div class=3D"gmail_extra"><br=
><div class=3D"gmail_quote">On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:ascot.moss@gmail.com" target=3D"_blank"=
>ascot.moss@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x"><div dir=3D"ltr"><div><div><div><div><div><div>Hi,<br><br></div>I read s=
ome (old?) articles from Internet about Mapr-FS vs HDFS. <br><br></div><a h=
ref=3D"https://www.mapr.com/products/m5-features/no-namenode-architecture" =
target=3D"_blank">https://www.mapr.com/products/m5-features/no-namenode-arc=
hitecture</a><br><br></div>It states that HDFS Federation has <br><br>a) &q=
uot;Multiple Single Points of Failure&quot;, is it really true?=C2=A0 <br>W=
hy MapR uses HDFS but not HDFS2 in its comparison as this would lead to an =
unfair comparison (or even misleading comparison)?=C2=A0 (HDFS was from Had=
oop 1.x, the old generation) HDFS2 is available since 2013-10-15, there is =
no any Single Points of=C2=A0 Failure in HDFS2.<br><br>b) &quot;Limit to 50=
-200 million files&quot;, is it really true? <br>I have seen so many real w=
orld Hadoop Clusters with over 10PB data, some even with 150PB data.=C2=A0 =
If &quot;Limit to 50 -200 millions files&quot; were true in HDFS2, why are =
there so many production Hadoop clusters in real world? how can they mange =
well the issue of=C2=A0 &quot;Limit to 50-200 million files&quot;? For inst=
ances,=C2=A0 the Facebook&#39;s &quot;Like&quot; implementation runs on HBa=
se at Web Scale, I can image HBase generates huge number of files in Facboo=
k&#39;s Hadoop cluster, the number of files in Facebook&#39;s Hadoop cluste=
r should be much much bigger than 50-200 million.<br><br></div><div>From my=
 point of view, in contrast, MaprFS should have true limitation up to 1T fi=
les while HDFS2 can handle true unlimited files, please do correct me if I =
am wrong.<br></div><div><br></div>c) &quot;Performance Bottleneck&quot;, ag=
ain, is it really true?<br>MaprFS does not have namenode in order to gain f=
ile system performance. If without Namenode, MaprFS would lose Data Localit=
y which is one of the beauties of Hadoop=C2=A0 If Data Locality is no longe=
r available, any big data application running on MaprFS might gain some fil=
e system performance but it would totally lose the true gain of performance=
 from Data Locality provided by Hadoop&#39;s namenode (gain small lose big)=
<br><br></div>d) &quot;Commercial NAS required&quot;<br>Is there any wiki/b=
log/discussion about Commercial NAS on Hadoop Federation?<br><br></div><div=
>regards<br></div>=C2=A0<br><div><div><br><br></div></div></div>
</blockquote></div><br></div>

<br>
</div></div><div style=3D"font-family:Arial,Helvetica,sans-serif;font-size:=
1.3em"><div style=3D"font-family:arial,sans-serif;font-size:13.333333969116=
2px;background-color:rgb(255,255,255)"><div style=3D"font-family:Arial,Helv=
etica,sans-serif;font-size:1.3em"><div style=3D"font-family:arial,sans-seri=
f;font-size:13.3333339691162px"><span style=3D"color:rgb(34,34,34)">Want to=
 work at Handy? Check out our=C2=A0</span><span style=3D"color:rgb(34,34,34=
)"><a href=3D"http://www.handy.com/careers" target=3D"_blank">culture deck =
and open roles</a></span></div></div></div></div><div><div><div><div></div>=
<div style=3D"font-family:arial,sans-serif;font-size:13.3333339691162px;bac=
kground-color:rgb(255,255,255)"><span style=3D"color:rgb(34,34,34)">Latest=
=C2=A0</span><span style=3D"color:rgb(34,34,34)"><a href=3D"http://www.hand=
y.com/press" target=3D"_blank">news</a></span><span style=3D"color:rgb(34,3=
4,34)">=C2=A0at Handy</span></div><div style=3D"font-family:arial,sans-seri=
f;font-size:13.3333339691162px;background-color:rgb(255,255,255)"><span sty=
le=3D"color:rgb(34,34,34)">Handy=C2=A0</span><span style=3D"color:rgb(34,34=
,34)"><a href=3D"http://venturebeat.com/2015/11/02/on-demand-home-service-h=
andy-raises-50m-in-round-led-by-fidelity/" target=3D"_blank">just raised $5=
0m</a></span><span style=3D"color:rgb(34,34,34)">=C2=A0led by Fidelity</spa=
n></div><div style=3D"font-family:arial,sans-serif;font-size:13.33333396911=
62px;background-color:rgb(255,255,255)"><span style=3D"color:rgb(34,34,34)"=
><br></span></div><div style=3D"font-family:arial,sans-serif;font-size:13.3=
333339691162px;background-color:rgb(255,255,255)"><img src=3D"http://market=
ing-email-assets.handybook.com/smalllogo.png"></div></div></div></div></blo=
ckquote></div><br></div>

--001a114f0aaa67c11205348cc290--